SEQUENCE LISTING 



(1) GENERAL INFORMATION: 

(i) APPLICANT: 

(A) NAME: National Starch and Chemical Investment 

Holding Corporation 

(B) STREET: 501 Silverside Road, Suite 27 

(C) CITY: Wilmington 

(D) STATE: Delaware 

(E) COUNTRY: United States of America 

(F) POSTAL CODE (ZIP): 19809 

(ii) TITLE OF INVENTION: Improvements in or Relating to Plant Starch 

Composition 

(iii) NUMBER OF SEQUENCES: 20 

(iv) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC-DOS/MS-DOS 

(D) SOFTWARE: Patentln Release #1.0, Version #1.30 (EPO) 



(2) INFORMATION FOR SEQ ID NO: 1: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 57 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 1: 

AAGGATCCGT CGACATCGAT AATACGACTC ACTATAGGGA TTTTTTTTTT 
TTTTTTT 57 

(2) INFORMATION FOR SEQ ID NO: 2: 
(i) SEQUENCE CHARACTERISTICS: 



(A) LENGTH: 17 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 
AAGGATCCGT CGACATC 
(2) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 17 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 
GACATCGATA ATACGAC 
(2) INFORMATION FOR SEQ ID NO: 4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 
CATCCAACCA CCATCTCGCA 
(2) INFORMATION FOR SEQ ID NO: 5: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 
TTGAGAGAAG ATACCTAAGT 
(2) INFORMATION FOR SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 
ATGTTCAGTC CATCTAAAGT 
(2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 
AGAACAACAA TTCCTAGCTC 



(2) INFORMATION FOR SEQ ID NO: 8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 
GGGGCCTTGA ACTCAGCAAT 
(2) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 
CGTCCCAGCA TTCGACATAA 
(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 



CTTGGATCCT TGAACTCAGC AATTTG 



26 



(2) INFORMATION FOR SEQ ID NO: 1 1 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 29 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 1 1 : 
TAACTCGAGC AACGCGATCA CAAGTTCGT 29 
(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3003 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 

GATGGGGCCT TGAACTCAGC AATTTGACAC TCAGTTAGTT ACACTGCCAT 
CACTTATCAG 60 

ATCTCTATTT TTTCTCTTAA TTCCAACCAA GGAATGAATA AAAAGATAGA 
TTTGTAAAAA 120 

CCCTAAGGAG AGAAGAAGAA AGATGGTGTA TACACTCTCT GGAGTTCGTT 
TTCCTACTGT 180 

TCCATCAGTG TACAAATCTA ATGGATTCAG CAGTAATGGT GATCGGAGGA 
ATGCTAATAT 240 



TTCTGTATTC TTGAAAAAAC ACTCTCTTTC ACGGAAGATC TTGGCTGAAA 
AGTCTTCTTA 300 

CAATTCCGAA TCCCGACCTT CTACAATTGC AGCATCGGGG AAAGTCCTTG 
TGCCTGGAAT 360 

CCAGAGTGAT AGCTCCTCAT CCTCAACAGA TCAATTTGAG TTCGCTGAGA 
CATCTCCAGA 420 

AAATTCCCCA GCATCAACTG ATGTAGATAG TTCAACAATG GAACACGCTA 
GCCAGATTAA 480 

AACTGAGAAC GATGACGTTG AGCCGTCAAG TGATCTTACA 
GGAAGTGTTG AAGAGCTGGA 540 

TTTTGCTTCA TCACTACAAC TACAAGAAGG TGGTAAACTG GAGGAGTCTA 
AAACATTAAA 600 

TACTTCTGAA GAGACAATTA TTGATGAATC TGATAGGATC AGAGAGAGGG 
GCATCCCTCC 660 

ACCTGGACTT GGTCAGAAGA TTTATGAAAT AGACCCCCTT TTGACAAACT 
ATCGTCAACA 720 

CCTTGATTAC AGGTATTCAC AGTACAAGAA ACTGAGGGAG GCAATTGACA 
AGTATGAGGG 780 

TGGTTTGGAA GCTTTTTCTC GTGGTTATGA AAGAATGGGT TTCACTCGTA 
GTGCTACAGG 840 

TATCACTTAC CGTGAGTGGG CTCCTGGTGC CCAGTCAGCT 
GCCCTCATTG GGGATTTCAA 900 

CAATTGGGAC GCAAATGCTG ACTTTATGAC TCGGAATGAA TTTGGTGTCT 
GAGAGATTTT 960 

TCTGCCAAAT AATGTGGATG GTTCTCCTGC AATTCCTCAT GGGTCCAGAG 
TGAAGATACG 1020 

TATGGACACT CCATCAGGTG TTAAGGATTC CATTCCTGCT TGGATCAACT 
ACTCTTTACA 1080 

GCTTCCTGAT GAAATTCCAT ATAATGGAAT ATATTATGAT CCACCCGAAG 
AGGAGAGGTA 1140 



TATCTTCCAA CACCCACGGC CAAAGAAACC AAAGTCGGTG AGAATATATG 
AATCTCATAT 1200 

TGGAATGAGT AGTCCGGAGC CTAAAATTAA CTCATACGTG AATTTTAGAG 
ATGAAGTTCT 1260 

TCCTCGCATA AAAAAAGCTT GGGTACAATG CGGTGCAAAT TATGGCTATT 
CAAGAGCATT 1320 

CTTATTATGC TAGTTTTGGT TATCATGTCA CAAAI I I I I I TGCACCAAGC 
AGCCGTTTTG 1380 

GAACGCCCGA CGACCTTAAG TCTTTGATTG ATAAAGCTCA TGAGCTAGGA 
ATTGTTGTTC 1440 

TCATGGACAT TGTTCACAGC CATGCATCAA ATAATACTTT AGATGGACTG 
AACATGTTTG 1500 

ACGGCACAGA TAGTTGTTAC TTTCACTCTG GAGCTCGTGG TTATCATTGG 
ATGTGGGATT 1560 

TCCGCCTCTT TAACTATGGA AACTGGGAGG TACTTAGGTA TCTTCTCTCA 
AATGCGAGAT 1620 

GGTGGTTGGA TGAGTTCAAA TTTGATGGAT TTAGATTTGA TGGTGTGACA 
TCAATGATGT 1680 

GTACTCACCA CGGATTATCG GTGGGATTCA CTGGGAACTA 
CGAGGAATAC TTTGGACTCG 1 740 

CAACTGATGT GGATGCTGTT GTGTATCTGA TGCTGGTCAA CGATCTTATT 
CATGGGCTTT 1800 

TCCCAGATGC AATTACCATT GGTGAAGATG TTAGCGGAAT GCCGACATTT 
TGTGTTCCCG 1860 

TTCAAGATGG GGGTGTTGGC TTTGACTATC GGCTGCATAT GGCAATTGCT 
GATAAATGGA 1920 

TTGAGTTGCT CAAGAAACGG GATGAGGATT GGAGAGTGGG 
TGATATTGTT CATACACTGA 1 980 

CAAATAGAAG ATGGTCGGAA AAGTGTGTTT CATACGCTGA AAGTCATGAT 
CAAGCTCTAG 2040 



TCGGTGATAA AACTATAGCA TTCTGGCTGA TGGACAAGGA TATGTATGAT 
TTTATGGCTC 2100 

TGGATAGACC GTCAACATCA TTAATAGATC GTGGGATAGC ATTACACAAG 
ATGATTAGGC 2160 

TTGTAACTAT GGGATTAGGA GGAGAAGGGT ACCTAAATTT CATGGGAAAT 
GAATTCGGCC 2220 

ACCCTGAGTG GATTGATTTC CCTAGGGCTG AACAACACCT CTCTGATGGC 
TCAGTAATTC 2280 

CCAGAAACCA ATTCAGTTAT GATAAATGCA GACGGAGATT TGACCTGGGA 
GATGCAGAAT 2340 

ATTTAAGATA CCGTGGGTTG CAAGAATTTG ACCGGGCTAT GCAGTATCTT 
GAAGATAAAT 2400 

ATGAGTTTAT GACTTCAGAA CACCAGTTCA TATCACGAAA GGATGAAGGA 
GATAGGATGA 2460 

TTGTATTTGA AAAAGGAAAC CTAGTTTTTG TCTTTAATTT TCACTGGACA 
AAAGGCTATT 2520 

CAGACTATCG CATAGGCTGC CTGAAGCCTG GAAAATACAA 
GGTTGCCTTG GACTCAGATG 2580 

ATCCACTTTT TGGTGGCTTC GGGAGAATTG ATCATAATGC CGAATATTTC 
ACCTTTGAAG 2640 

GATGGTATGA TGATCGTCCT CGTTCAATTA TGGTGTATGC ACCTAGTAGA 
ACAGCAGTGG 2700 

TCTATGCACT AGTAGACAAA GAAGAAGAAG AAGAAGAAGA AGTAGCAGTA 
GTAGAAGAAG 2760 

TAGTAGTAGA AGAAGAATGA ACGAACTTGT GATCGCGTTG AAAGATTTGA 
ACGCCACATA 2820 

GAGCTTCTTG ACGTATCTGG CAATATTGCA TTAGTCTTGG CGGAATTTCA 
TGTGACAACA 2880 

GGTTTGCAAT TCTTTCCACT ATTAGTAGTG CAACGATATA CGCAGAGATG 
AAGTGCTGAA 2940 



CAAAAACATA TGTAAAATCG ATGAATTTAT GTCGAATGCT GGGACGATCG 
AATTCCTGCA 3000 

GCC 3003 
(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2975 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 

TTGATGGGCC TTGAACTCAG CAATTTGACA CTCAGTTAGT TACACTCCTA 
TCACTTATCA 60 

GATCTCTATT TTTTCTCTTA ATTCCAACCA GGGGAATGAA TAAAAGGATA 
GATTTGTAAA 120 

AACCCTAAGG AGAGAAGAAG AAAGATGGTG TATATACTCT CTGGAGTTCG 
TTTTCCTACT 180 

GTTCCATCAG TGTACAAATC TAATGGATTC AGCAGTAATG GTGATCGGAG 
GAATGCTAAT 240 

GTTTCTGTAT TCTTGAAAAA GCACTCTCTT TCACGGAAGA TCTTGGCTGA 
AAAGTCTTCT 300 

TACAATTCCG AATTCCGACC TTCTACAGTT GCAGCATCGG GGAAAGTCCT 
TGTGCCTGGA 360 

ACCCAGAGTG ATAGCTCCTC ATCCTCAACA GACCAATTTG AGTTCACTGA 
GACATCTCCA 420 

GAAAATTCCC CAGCATCAAC TGATGTAGAT AGTTCAACAA TGGAACACGC 
TAGCCAGATT 480 

AAAACTGAGA ACGATGACGT TGAGCCGTCA AGTGATCTTA 
CAGGAAGTGT TGAAGAGCTG 540 



GATTTTGCTT CATCACTACA ACTACAAGAA GGTGGTAAAC TGGAGGAGTC 
TAAAACATTA 600 

AATACTTCTG AAGAGACAAT TATTGATGAA TCTGATAGGA TCAGAGAGAG 
GGGCATCCCT 660 

CCACCTGGAC TTGGTCAGAA GATTTATGAA ATAGACCCCC TTTTGACAAA 
CTATCGTCAA 720 

CACCTTGATT ACAGGTATTC ACAGTACAAG AAACTGAGGG AGGCAATTGA 
CAAGTATGAG 780 

GGTGGTTTGG AAGCTTTTCT CGTGGTTATG AAAAAATGGG TTTCACTCGT 
AGTGCTACAG 840 

GTATCACTTA CCGTGAGTGG GCTCCTGGTG CCCAGTCAGC 
TGCCCTCATT GGAGATTTCA 900 

ACAATTGGGA CGCAAATGCT GACATTATGA CTCGGAATGA ATTTGGTGTC 
TGGGAGATTT 960 

TTCTGCCAAA TAATGTGGAT GGTTCTCCTG CAATTCCTCA TGGGTCCAGA 
GTGAAGATAC 1020 

GTATGGACAC TCCATCAGGT GTTAAGGATT CCATTCCTGC TTGGATCAAC 
TACTCTTTAC 1080 

AGCTTCCTGA TGAAATTCCA TATAATGGAA TATATTATGA TCCACCCGAA 
GAGGAGAGGT 1140 

ATATCTTCCA ACACCCACGG CCAAAGAAAC CAAAGTCGCT GAGAATATAT 
GAATCTCATA 1200 

TTGGAATGAG TAGTCCGGAG CCTAAAATTA ACTCATACGT GAATTTTAGA 
GATGAAGTTC 1260 

TTCCTCGCAT AAAAAAGCTT GGGTACAATG CGCTGCGAAT TATGGCTATT 
CAAGAGCATT 1320 

CTTATTATGC TAGTTTTGGT TATCATGTCA CAAATTTTTT TGCACCAAGC 
AGCCGTTTTG 1380 

GAACGCCCGA CGACCTTAAG TCTTCGATTG ATAAAGCTCA TGAGCTAGGA 
ATTGTTGTTC 1440 



TCATGGACAT CGTTCACAGC CATGCATCAA ATAATACTTT AGATGGACTG 
AACATGTTTG 1500 

ACGGCACCGA TAGTTGTTAC TTTCACTCTG GAGCTCGTGG TTATCATTGG 
ATGTGGGATT 1560 

CCGCCTCTTT AACTATGGAA ACTGGGAGGT ACTTAGGTAT CTTCTCTCAA 
ATGCGAGATG 1620 

GTGGTTGGAT GAGTTCAAAT TTGATGGATT TAGATTCGAT GGTGTGACAT 
CAATGATGTA 1680 

TACTCACCAC GGATTATCGG TGGGATTCAC TGGGAACTAC GAGGAATACT 
TTGGACTCGC 1740 

AACTGATGTG GATGCTGTTG TGTATCTGAT GCTGGTCAAC GATCTTATTC 
ATAGGCTTTT 1800 

CCCAGATGCA ATTACCATTG GTGAAGATGT TAGCGGAATG CCGACATTTT 
GTATTCCCGT 1860 

TCAAGATGGG GGTGTTGGCT TTGACTATCG GCTGCATATG 
GCAATTGCTG ATAAATGGAT 1920 

TGAGTTGCTC AAGAAACGGG ATGAGGATTG GAGAGTGGGT 
GATATTGTTC ATACACTGAC 1 980 

AAATAGAAGA TGGTCGGAAA AGTGTGTTTC ATACGCTGAA AGTCATGATC 
AAGCTCTAGT 2040 

CGGTGATAAA ACTATAGCAT TCTGGCTGAT GGACAAGGAT ATGTATGATT 
TTATGGCTCT 2100 

GGATAGACCG CCAACATCAT TAATAGATCG TGGGATAGCA TTGCACAAGA 
TGATTAGGCT 2160 

TGTAACTATG GGATTAGGAG GAGAAGGGTA CCTAAATTTC ATGGGAAATG 
AATTCGGCCA 2220 

CCCTGAGTGG ATTGATTTCC CTAGGGCTGA GCCACACCTT TCTGATGGCT 
CAGTAATTCC 2280 

CGGAAACCAA TTCAGTTATG ATAAATGCAG ACGGAGATTT 
GACCTGGGAG ATGCAGAATA 2340 



TTTAAGATAC CATGGGTTAC AAGAATTTGA CTGGGCTATG CAGTATCTTG 
AAGATAAATA 2400 

TGAGTTTATG ACTTCAGAAC ACCAGTTCAT ATCACGAAAG GATGAAGGAG 
ATAGGATGAT 2460 

TGTATTTGAA AGAGGAAACC TAGTTTTCGT CTTTAATTTT CACTGGACAA 
ATAGCTATTC 2520 

AGACTATCGC ATAGGCTGCC TGAAGCCTGG AAAATACAAG 
GTTGTCTTGG ACTCAGATGA 2580 

TCCACTTTTT GGTGGCTTCG GGAGAATTGA TCATAATGCC GAATATTTCA 
CCTCTGAAGG 2640 

ATCGTATGAT GATCGTCCTT GTTCAATTAT GGTGTATGCA CCTAGTAGAA 
CAGCAGTGGT 2700 

CTATGCACTA GTAGACAAAC TAGAAGTAGC AGTAGTAGAA GAACCCATTG 
AAGAATGAAC 2760 

GAACTTGTGA TCGCGTTGAA AGATTTGAAC GTTACTTGGT CATCCACATA 
GAGCTTCTTG 2820 

ACATCAGTCT TGGCGGAATT GCATGTGACA ACAAGGTTTG CAGTTCTTTC 
CACTATTAGT 2880 

AGTCCACCGA TATACGCAGA GATGAAGTGC TGAACAAACA TATGTAAAAT 
CGATGAATTT 2940 

ATGTCGAATG CTGGGACGAT CGAATTCCTG CAGCC 
2975 

(2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3033 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION:145..2790 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 

TTGATGGGGC CTTGAACTCA GCAATTTGAC ACTCAGTTAG TTACACTCCT 
ATCACTTATC 60 

AGATCTCTAT TTTTTCTCTT AATTCCAACC AAGGAATGAA TAAAAGGATA 
GATTTGTAAA 120 

AACCCTAAGG AGAGAAGAAG AAAG ATG GTG TAT ACA CTC TCT GGA 
GTTCGT 171 

Met Val Tyr Thr Leu Ser Gly Val Arg 
1 5 

TTT CCT ACT GTT CCA TCA GTG TAC AAA TCT AAT GGA TTC AGC AGT 
AAT 219 

Phe Pro Thr Val Pro Ser Val Tyr Lys Ser Asn Gly Phe Ser Ser Asn 
10 15 20 25 

GGT GAT CGG AGG AAT GCT AAT GTT TCT GTA TTC TTG AAA AAG CAC 
TCT 267 

Gly Asp Arg Arg Asn Ala Asn Val Ser Val Phe Leu Lys Lys His Ser 
30 35 40 

CTT TCA CGG AAG ATC TTG GCT GAA AAG TCT TCT TAC AAT TCC GAA 
TTC 315 

Leu Ser Arg Lys lie Leu Ala Glu Lys Ser Ser Tyr Asn Ser Glu Phe 
45 50 55 

CGA CCT TCT ACA GTT GCA GCA TCG GGG AAA GTC CTT GTG CCT GGA 
ACC 363 

Arg Pro Ser Thr Val Ala Ala Ser Gly Lys Val Leu Val Pro Gly Thr 
60 65 70 

CAG AGT GAT AGC TCC TCA TCC TCA ACA GAC CAA TTT GAG TTC ACT 
GAG 411 

Gin Ser Asp Ser Ser Ser Ser Ser Thr Asp Gin Phe Glu Phe Thr Glu 
75 80 85 

ACA TCT CCA GAA AAT TCC CCA GCA TCA ACT GAT GTA GAT AGT TCA 
ACA 459 

Thr Ser Pro Glu Asn Ser Pro Ala Ser Thr Asp Val Asp Ser Ser Thr 
90 95 100 105 



ATG GAA CAC GCT AGC CAG ATT AAA ACT GAG AAC GAT GAC GTT GAG 
CCG 507 

Met Glu His Ala Ser Gin He Lys Thr Glu Asn Asp Asp Val Glu Pro 
110 115 120 

TCA AGT GAT CTT ACA GGA AGT GTT GAA GAG CTG GAT TTT GCT TCA 
TCA 555 

Ser Ser Asp Leu Thr Gly Ser Val Glu Glu Leu Asp Phe Ala Ser Ser 
125 130 135 

CTA CAA CTA CAA GAA GGT GGT AAA CTG GAG GAG TCT AAA ACA TTA 
AAT 603 

Leu Gin Leu Gin Glu Gly Gly Lys Leu Glu Glu Ser Lys Thr Leu Asn 
140 145 150 

ACT TCT GAA GAG ACA ATT ATT GAT GAA TCT GAT AGG ATC AGA GAG 
AGG 651 

Thr Ser Glu Glu Thr lie He Asp Glu Ser Asp Arg He Arg Glu Arg 
155 160 165 

GGC ATC CCT CCA CCT GGA CTT GGT CAG AAG ATT TAT GAA ATA GAC 
CCC 699 

Gly He Pro Pro Pro Gly Leu Gly Gin Lys lie Tyr Glu He Asp Pro 
170 175 180 185 

CTT TTG ACA AAC TAT CGT CAA CAC CTT GAT TAC AGG TAT TCA CAG 
TAC 747 

Leu Leu Thr Asn Tyr Arg Gin His Leu Asp Tyr Arg Tyr Ser Gin Tyr 
190 195 200 

AAG AAA CTG AGG GAG GCA ATT GAC AAG TAT GAG GGT GGT TTG GAA 
GCC 795 

Lys Lys Leu Arg Glu Ala He Asp Lys Tyr Glu Gly Gly Leu Glu Ala 
205 210 215 

TTT TCT CGT GGT TAT GAA AAA ATG GGT TTC ACT CGT AGT GCT ACA 
GGT 843 

Phe Ser Arg Gly Tyr Glu Lys Met Gly Phe Thr Arg Ser Ala Thr Gly 
220 225 230 

ATC ACT TAC CGT GAG TGG GCT CTT GGT GCC CAG TCA GCT GCC CTC 
ATT 891 

He Thr Tyr Arg Glu Trp Ala Leu Gly Ala Gin Ser Ala Ala Leu He 
235 240 245 



GGA GAT TTC AAC AAT TGG GAC GCA AAT GCT GAC ATT ATG ACT CGG 
AAT 939 

Gly Asp Phe Asn Asn Trp Asp Ala Asn Ala Asp He Met Thr Arg Asn 
250 255 260 265 

GAA TTT GGT GTC TGG GAG ATT TTT CTG CCA AAT AAT GTG GAT GGT 
TCT 987 

Glu Phe Gly Val Trp Glu lie Phe Leu Pro Asn Asn Val Asp Gly Ser 
270 275 280 

CCT GCA ATT CCT CAT GGG TCC AGA GTG AAG ATA CGT ATG GAC ACT 
CCA 1035 

Pro Ala lie Pro His Gly Ser Arg Val Lys lie Arg Met Asp Thr Pro 
285 290 295 

TCA GGT GTT AAG GAT TCC ATT CCT GCT TGG ATC AAC TAC TCT TTA 
CAG 1083 

Ser Gly Val Lys Asp Ser He Pro Ala Trp He Asn Tyr Ser Leu Gin 
300 305 310 

CTT CCT GAT GAA ATT CCA TAT AAT GGA ATA CAT TAT GAT CCA CCC 
GAA 1131 

Leu Pro Asp Glu He Pro Tyr Asn Gly He His Tyr Asp Pro Pro Glu 
315 320 325 

GAG GAG AGG TAT ATC TTC CAA CAC CCA CGG CCA AAG AAA CCA AAG 
TCG 1179 

Glu Glu Arg Tyr He Phe Gin His Pro Arg Pro Lys Lys Pro Lys Ser 
330 335 340 345 

CTG AGA ATA TAT GAA TCT CAT ATT GGA ATG AGT AGT CCG GAG CCT 
AAA 1227 

Leu Arg He Tyr Glu Ser His He Gly Met Ser Ser Pro Glu Pro Lys 
350 355 360 

ATT AAC TCA TAC GTG AAT TTT AGA GAT GAA GTT CTT CCT CGC ATA 
AAA 1275 

He Asn Ser Tyr Val Asn Phe Arg Asp Glu Val Leu Pro Arg He Lys 
365 370 375 

AAG CTT GGG TAC AAT GCG CTG CAA ATT ATG GCT ATT CAA GAG CAT 
TCT 1323 

Lys Leu Gly Tyr Asn Ala Leu Gin He Met Ala He Gin Glu His Ser 
380 385 390 



TAT TAC GCT AGT TTT GGT TAT CAT GTC ACA AAT TTT TTT GCA CCA 
AGC 1371 

Tyr Tyr Ala Ser Phe Gly Tyr His Val Thr Asn Phe Phe Ala Pro Ser 
395 400 405 

AGC CGT TTT GGA ACG CCC GAC GAC CTT AAG TCT TTG ATT GAT AAA 
GCT 1419 

Ser Arg Phe Gly Thr Pro Asp Asp Leu Lys Ser Leu lie Asp Lys Ala 
410 415 420 425 

CAT GAG CTA GGA ATT GTT GTT CTC ATG GAC ATT GTT CAC AGC CAT 
GCA 1467 

His Glu Leu Gly He Val Val Leu Met Asp He Val His Ser His Ala 
430 435 440 

TCA AAT AAT ACT TTA GAT GGA CTG AAC ATG TTT GAC TGC ACC GAT 
AGT 1515 

Ser Asn Asn Thr Leu Asp Gly Leu Asn Met Phe Asp Cys Thr Asp Ser 
445 450 455 

TGT TAC TTT CAC TCT GGA GCT CGT GGT TAT CAT TGG ATG TGG GAT 
TCC 1563 

Cys Tyr Phe His Ser Gly Ala Arg Gly Tyr His Trp Met Trp Asp Ser 
460 465 470 

CGC CTC TTT AAC TAT GGA AAC TGG GAG GTA CTT AGG TAT CTT CTC 
TCA 1611 

Arg Leu Phe Asn Tyr Gly Asn Trp Glu Val Leu Arg Tyr Leu Leu Ser 
475 480 485 

AAT GCG AGA TGG TGG TTG GAT GCG TTC AAA TTT GAT GGA TTT AGA 
TTT 1659 

Asn Ala Arg Trp Trp Leu Asp Ala Phe Lys Phe Asp Gly Phe Arg Phe 
490 495 500 505 

GAT GGT GTG ACA TCA ATG ATG TAT ATT CAC CAC GGA TTA TCG GTG 
GGA 1707 

Asp Gly Val Thr Ser Met Met Tyr lie His His Gly Leu Ser Val Gly 
510 515 520 

TTC ACT GGG AAC TAC GAG GAA TAC TTT GGA CTC GCA ACT GAT GTG 
GAT 1755 

Phe Thr Gly Asn Tyr Glu Glu Tyr Phe Gly Leu Ala Thr Asp Val Asp 
525 530 535 



GCT GTT GTG TAT CTG ATG CTG GTC AAC GAT CTT ATT CAT GGG CTT 
TTC 1803 

Ala Val Val Tyr Leu Met Leu Val Asn Asp Leu lie His Gly Leu Phe 
540 545 550 

CCA GAT GCA ATT ACC ATT GGT GAA GAT GTT AGC GGA ATG CCG ACA 
TTT 1851 

Pro Asp Ala lie Thr lie Gly Glu Asp Val Ser Gly Met Pro Thr Phe 
555 560 565 

TGT ATT CCC GTC CAA GAG GGG GGT GTT GGC TTT GAC TAT CGG CTG 
CAT 1899 

Cys He Pro Val Gin Glu Gly Gly Val Gly Phe Asp Tyr Arg Leu His 
570 575 580 585 

ATG GCA ATT GCT GAT AAA CGG ATT GAG TTG CTC AAG AAA CGG GAT 
GAG 1947 

Met Ala He Ala Asp Lys Arg lie Glu Leu Leu Lys Lys Arg Asp Glu 
590 595 600 

GAT TGG AGA GTG GGT GAT ATT GTT CAT ACA CTG ACA AAT AGA AGA 
TGG 1995 

Asp Trp Arg Val Gly Asp He Val His Thr Leu Thr Asn Arg Arg Tip 
605 610 615 

TCG GAA AAG TGT GTT TCA TAC GCT GAA AGT CAT GAT CAA GCT CTA 
GTC 2043 

Ser Glu Lys Cys Val Ser Tyr Ala Glu Ser His Asp Gin Ala Leu Val 
620 ' 625 630 

GGT GAT AAA ACT ATA GCA TTC TGG CTG ATG GAC AAG GAT ATG TAT 
GAT 2091 

Gly Asp Lys Thr He Ala Phe Trp Leu Met Asp Lys Asp Met Tyr Asp 
635 ' 640 645 

TTT ATG GCT CTG GAT AGA CCG TCA ACA TCA TTA ATA GAT CGT GGG 
ATA 2139 

Phe Met Ala Leu Asp Arg Pro Ser Thr Ser Leu He Asp Arg Gly lie 
650 655 660 665 

GCA TTG CAC AAG ATG ATT AGG CTT GTA ACT ATG GGA TTA GGA GGA 
GAA 2187 

Ala Leu His Lys Met He Arg Leu Val Thr Met Gly Leu Gly Gly Glu 
670 675 680 



GGG TAC CTA AAT TTC ATG GGA AAT GAA TTC GGC CAC CCT GAG TGG 
ATT 2235 

Gly Tyr Leu Asn Phe Met Gly Asn Glu Phe Gly His Pro Glu Trp He 
685 690 695 

GAT TTC CCT AGG GCT GAA CAA CAC CTC TCT GAT GGC TCA GTA ATC 
CCC 2283 

Asp Phe Pro Arg Ala Glu Gin His Leu Ser Asp Gly Ser Val lie Pro 
700 705 710 

GGA AAC CAA TTC AGT TAT GAT AAA TGC AGA CGG AGA TTT GAC CTG 
GGA 2331 

Gly Asn Gin Phe Ser Tyr Asp Lys Cys Arg Arg Arg Phe Asp Leu Gly 
715 720 725 

GAT GCA GAA TAT TTA AGA TAC CGT GGG TTG CAA GAA TTT GAC CGG 
CCT 2379 

Asp Ala Glu Tyr Leu Arg Tyr Arg Gly Leu Gin Glu Phe Asp Arg Pro 
730 735 ' 740 745 

ATG CAG TAT CTT GAA GAT AAA TAT GAG TTT ATG ACT TCA GAA CAC 
CAG 2427 

Met Gin Tyr Leu Glu Asp Lys Tyr Glu Phe Met Thr Ser Glu His Gin 
750 755 760 

TTC ATA TCA CGA AAG GAT GAA GGA GAT AGG ATG ATT GTA TTT GAA 
AAA 2475 

Phe lie Ser Arg Lys Asp Glu Gly Asp Arg Met He Val Phe Glu Lys 
765 770 775 

GGA AAC CTA GTT TTT GTC TTT AAT TTT CAC TGG ACA AAA AGC TAT 
TCA 2523 

Gly Asn Leu Val Phe Val Phe Asn Phe His Trp Thr Lys Ser Tyr Ser 
780 785 790 

GAC TAT CGC ATA GCC TGC CTG AAG CCT GGA AAA TAC AAG GTT GCC 
TTG 2571 

Asp Tyr Arg lie Ala Cys Leu Lys Pro Gly Lys Tyr Lys Val Ala Leu 
795 800 805 

GAC TCA GAT GAT CCA CTT TTT GGT GGC TTC GGG AGA ATT GAT CAT 
AAT 2619 

Asp Ser Asp Asp Pro Leu Phe Gly Gly Phe Gly Arg lie Asp His Asn 
810 815 820 825 



GCC GAA TAT TTC ACC TTT GAA GGA TGG TAT GAT GAT CGT CCT CGT 
TCA 2667 

Ala Glu Tyr Phe Thr Phe Glu Gly Trp Tyr Asp Asp Arg Pro Arg Ser 
830 835 840 

ATT ATG GTG TAT GCA CCT TGT AAA ACA GCA GTG GTC TAT GCA CTA 
GTA 2715 

lie Met Val Tyr Ala Pro Cys Lys Thr Ala Val Val Tyr Ala Leu Val 
845 850 855 

GAC AAA GAA GAA GAA GAA GAA GAA GAA GAA GAA GAA GAA GTA GCA 
GCA 2763 

Asp Lys Glu Glu Glu Glu Glu Glu Glu Glu Glu Glu Glu Val Ala Ala 
860 865 870 

GTA GAA GAA GTA GTA GTA GAA GAA GAA TGAACGAACT TGTGATCGCG 
2810 

Val Glu Glu Val Val Val Glu Glu Glu 
875 880 

TTGAAAGATT TGAACGCTAC ATAGAGCTTC TTGACGTATC TGGCAATATT 
GCATCAGTCT 2870 

TGGCGGAATT TCATGTGACA CAAGGTTTGC AATTCTTTCC ACTATTAGTA 
GTGCAACGAT 2930 

ATACGCAGAG ATGAAGTGCT GAACAAACAT ATGTAAAATC GATGAATTTA 
TGTCGAATGC 2990 

TGGGACGATC GAATTCCTGC AGGCCGGGGG ACCCCTTAGT TCT 
3033 



(2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 882 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 

Met Val Tyr Thr Leu Ser Gly Val Arg Phe Pro Thr Val Pro Ser Val 
15 10 15 



Tyr Lys Ser Asn Gly Phe Ser Ser Asn Gly Asp Arg Arg Asn Ala Asn 
20 25 30 

Val Ser Val Phe Leu Lys Lys His Ser Leu Ser Arg Lys lie Leu Ala 
35 40 45 

Glu Lys Ser Ser Tyr Asn Ser Glu Phe Arg Pro Ser Thr Val Ala Ala 
50 55 60 

Ser Gly Lys Val Leu Val Pro Gly Thr Gin Ser Asp Ser Ser Ser Ser 
65 70 75 80 

Ser Thr Asp Gin Phe Glu Phe Thr Glu Thr Ser Pro Glu Asn Ser Pro 
85 90 95 

Ala Ser Thr Asp Val Asp Ser Ser Thr Met Glu His Ala Ser Gin He 
100 " 105 110 

Lys Thr Glu Asn Asp Asp Val Glu Pro Ser Ser Asp Leu Thr Gly Ser 
115 120 125 

Val Glu Glu Leu Asp Phe Ala Ser Ser Leu Gin Leu Gin Glu Gly Gly 
130 135 140 

Lys Leu Glu Glu Ser Lys Thr Leu Asn Thr Ser Glu Glu Thr lie lie 
145 150 155 160 

Asp Glu Ser Asp Arg lie Arg Glu Arg Gly lie Pro Pro Pro Gly Leu 
165 170 175 

Gly Gin Lys lie Tyr Glu He Asp Pro Leu Leu Thr Asn Tyr Arg Gin 
180 185 190 

His Leu Asp Tyr Arg Tyr Ser Gin Tyr Lys Lys Leu Arg Glu Ala lie 
195 200 205 

Asp Lys Tyr Glu Gly Gly Leu Glu Ala Phe Ser Arg Gly Tyr Glu Lys 
210 215 220 

Met Gly Phe Thr Arg Ser Ala Thr Gly lie Thr Tyr Arg Glu Trp Ala 
225 230 235 240 

Leu Gly Ala Gin Ser Ala Ala Leu lie Gly Asp Phe Asn Asn Trp Asp 
245 250 255 

Ala Asn Ala Asp He Met Thr Arg Asn Glu Phe Gly Val Trp Glu He 



260 265 270 

Phe Leu Pro Asn Asn Val Asp Gly Ser Pro Ala lie Pro His Gly Ser 
275 280 285 

Arg Val Lys lie Arg Met Asp Thr Pro Ser Gly Val Lys Asp Ser lie 
290 295 300 

Pro Ala Trp lie Asn Tyr Ser Leu Gin Leu Pro Asp Glu lie Pro Tyr 
305 310 315 320 

Asn Gly Me His Tyr Asp Pro Pro Glu Glu Glu Arg Tyr lie Phe Gin 
325 330 335 

His Pro Arg Pro Lys Lys Pro Lys Ser Leu Arg He Tyr Glu Ser His 
340 345 350 

He Gly Met Ser Ser Pro Glu Pro Lys He Asn Ser Tyr Val Asn Phe 
355 360 365 

Arg Asp Glu Val Leu Pro Arg lie Lys Lys Leu Gly Tyr Asn Ala Leu 
370 375 380 

Gin He Met Ala He Gin Glu His Ser Tyr Tyr Ala Ser Phe Gly Tyr 
385 390 395 400 

His Val Thr Asn Phe Phe Ala Pro Ser Ser Arg Phe Gly Thr Pro Asp 
405 410 415 

Asp Leu Lys Ser Leu He Asp Lys Ala His Glu Leu Gly He Val Val 
420 425 430 

Leu Met Asp lie Val His Ser His Ala Ser Asn Asn Thr Leu Asp Gly 
435 440 445 

Leu Asn Met Phe Asp Cys Thr Asp Ser Cys Tyr Phe His Ser Gly Ala 
450 455 460 

Arg Gly Tyr His Trp Met Trp Asp Ser Arg Leu Phe Asn Tyr Gly Asn 
465 470 475 480 

Trp Glu Val Leu Arg Tyr Leu Leu Ser Asn Ala Arg Trp Trp Leu Asp 
485 490 495 

Ala Phe Lys Phe Asp Gly Phe Arg Phe Asp Gly Val Thr Ser Met Met 
500 505 510 



Tyr lie His His Gly Leu Ser Val Gly Phe Thr Gly Asn Tyr Glu Glu 
515 520 525 

Tyr Phe Gly Leu Ala Thr Asp Val Asp Ala Val Val Tyr Leu Met Leu 
530 535 540 

Val Asn Asp Leu He His Gly Leu Phe Pro Asp Ala lie Thr lie Gly 
545 550 555 560 

Glu Asp Val Ser Gly Met Pro Thr Phe Cys lie Pro Val Gin Glu Gly 
565 570 575 

Gly Val Gly Phe Asp Tyr Arg Leu His Met Ala lie Ala Asp Lys Arg 
580 585 590 

He Glu Leu Leu Lys Lys Arg Asp Glu Asp Trp Arg Val Gly Asp He 
595 600 605 

Val His Thr Leu Thr Asn Arg Arg Trp Ser Glu Lys Cys Val Ser Tyr 
610 615 620 

Ala Glu Ser His Asp Gin Ala Leu Val Gly Asp Lys Thr lie Ala Phe 
625 630 635 640 

Trp Leu Met Asp Lys Asp Met Tyr Asp Phe Met Ala Leu Asp Arg Pro 
645 650 655 

Ser Thr Ser Leu He Asp Arg Gly He Ala Leu His Lys Met He Arg 
660 665 670 

Leu Val Thr Met Gly Leu Gly Gly Glu Gly Tyr Leu Asn Phe Met Gly 
675 680 685 

Asn Glu Phe Gly His Pro Glu Trp He Asp Phe Pro Arg Ala Glu Gin 
690 695 700 

His Leu Ser Asp Gly Ser Val He Pro Gly Asn Gin Phe Ser Tyr Asp 
705 710 715 720 

Lys Cys Arg Arg Arg Phe Asp Leu Gly Asp Ala Glu Tyr Leu Arg Tyr 
725 " 730 735 

Arg Gly Leu Gin Glu Phe Asp Arg Pro Met Gin Tyr Leu Glu Asp Lys 
740 745 750 



Tyr Glu Phe Met Thr Ser Glu His Gin Phe lie Ser Arg Lys Asp Glu 
755 760 765 

Gly Asp Arg Met lie Val Phe Glu Lys Gly Asn Leu Val Phe Val Phe 
770 " 775 780 

Asn Phe His Trp Thr Lys Ser Tyr Ser Asp Tyr Arg He Ala Cys Leu 
785 790 795 800 

Lys Pro Gly Lys Tyr Lys Val Ala Leu Asp Ser Asp Asp Pro Leu Phe 
805 810 815 

Gly Gly Phe Gly Arg He Asp His Asn Ala Glu Tyr Phe Thr Phe Glu 
820 825 830 

Gly Trp Tyr Asp Asp Arg Pro Arg Ser He Met Val Tyr Ala Pro Cys 
835 840 845 

Lys Thr Ala Val Val Tyr Ala Leu Val Asp Lys Glu Glu Glu Glu Glu 
850 855 860 

Glu Glu Glu Glu Glu Glu Val Ala Ala Val Glu Glu Val Val Val Glu 
865 870 875 880 

Glu Glu 



(2) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2576 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 

TCATTAAAGA GGAGAAATTA ACTATGAGAG GATCTCACCA TCACCATCAC 
CATGGGATCT 60 

TGGCTGAAAA GTCTTCTTAC AATTCCGAAT TCCGACCTTC TACAGTTGCA 
GCATCGGGGA 120 



AAGTCCTTGT GCCTGGAACC CAGAGTGATA GCTCCTCATC CTCAACAAAC 
CAATTTGAGT 180 

TCACTGAGAC ATCTCCAGAA AATTCCCCAG CATCAACTGA TGTAGATAGT 
TCAACAATGG 240 

AACACGCTAG CCAGATTAAA ACTGAGAACG ATGACGTTGA 
GCCGTCAAGT GATCTTACAG 300 

GAAGTGTTGA AGAGCTGGAT TTTGCTTCAT CACTACAACT ACAAGAAGGT 
GGTAAACTGG 360 

AGGAGTCTAA AACATTAAAT ACTTCTGAAG AGACAATTAT TGATGAATCT 
GATAGGATCA 420 

GAGAGAGGGG CATCCCTCCA CCTGGACTTG GTCAGAAGAT 
TTATGAAATA GACCCCCTTT 480 

TGACAAACTA TCGTCAACAC CTTGATTACA GGTATTCACA GTACAAGAAA 
CTGAGGGAGG 540 

CAATTGACAA GTATGAGGGT GGTTTGGAAG CTTTTTCTCG TGGTTATGAA 
AAAATGGGTT 600 

TCACTCGTAG TGCTACAGGT ATCACTTACC GTGAGTGGGC 
TCCTGGTGCC CAGTCAGCTG 660 

CCCTCATTGG AGATTTCAAC AATTGGGACG CAAATGCTGA CATTATGACT 
CGGAATGAAT 720 

TTGGTGTCTG GGAGATTTTT CTGCCAAATA ATGTGGATGG TTCTCCTGCA 
ATTCCTCATG 780 

GGTCCAGAGT GAAGATACGT ATGGACACTC CATCAGGTGT 
TAAGGATTCC ATTCCTGCTT 840 

GGATCAACTA CTCTACAGCT TCCTGATGAA ATTCCATATA ATGGAATATA 
TTATGATCCA 900 

CCCGAAGAGG AGAGGTATAT CTTCCAACAC CCACGGCCAA 
AGAAACCAAA GTCGCTGAGA 960 

ATATATGAAT CTCATATTGG AATGAGTAGT CCGGAGCCTA AAATTAACTC 
ATACGTGAAT 1020 



TTTAGAGATG AAGTTCTTCC TCGCATAAAA AAGCTTGGGT ACAATGCGCT 
GCAAATTATG 1080 

GCTATTCAAG AGCATTCTTA TTATGCTAGT TTTGGTTATC ATGTCACAAA 
TTTTTTTGCA 1140 

CCAAGCAGCC GTTTTGGAAC GCCCGACGAC CTTAAGTCTT TGATTGATAA 
AGCTCATGAG 1200 

CTAGGAATTG TTGTTCTCAT GGACATTGTT CACAGCCATG CATCAAATAA 
TACTTTAGAT 1260 

GGACTGAACA TGTTTGACGG CACCGATAGT TGTTACTTTC ACTCTGGAGC 
TCGTGGTTAT 1320 

CATTGGATGT GGGATTCCCG CCTTTTTAAC TATGGAAACT GGGAGGTACT 
TAGGTATCTT 1380 

CTCTCAAATG CGAGATGGTG GTTGGATGAG TTCAAATTTG ATGGATTTAG 
ATTTGATGGT 1440 

GTGACATCAA TGATGTATAC TCACCACGGA TTATCGGTGG GATTCACTGG 
GAACTACGAG 1500 

GAATACTTTG GACTCGCAAC TGATGTGGAT GCTGTTGTGT ATCTGATGCT 
GGTCAACGAT 1560 

CTTATTCATG GGCTTTTCCC AGATGCAATT ACCATTGGTG AAGATGTTAG 
CGGAATGCCG 1620 

ACATTTTGTA TTCCCGTTCA AGATGGGGGT GTTGGCTTTG ACTATCGGCT 
GCATATGGCA 1680 

ATTGCTGATA AATGGATTGA GTTGCTCAAG AAACGGGATG 
AGGATTGGAG AGTGGGTGAT 1740 

ATTGTTCATA CACTGACAAA TAGAAGATGG TCGGAAAAGT GTGTTTCATA 
CGCTGAAAGT 1800 

CATGATCAAG CTCTAGTCGG TGATAAAACT ATAGCATTCT GGCTGATGGA 
CAAGGATATG 1860 

TATGATTTTA TGGCTCTGGA TAGACCGCCA ACATCATTAA TAGATCGTGG 
GATAGCATTG 1920 



CACAAGATGA TTAGGCTTGT AACTATGGGA TTAGGAGGAG 
AAGGGTACCT AAATTTCATG 1 980 

GGAAATGAAT TCGGCCACCC TGAGTGGATT GATTTCCCTA 
GGGCTGAACA ACACCTCTCT 2040 

GATGACTCAG TAATTCCCGG AAACCAATTC AGTTATGATA AATGCAGACG 
GAGATTTGAC 2100 

CTGGGAGATG CAGAATATTT AAGATACCGT GGGTTGCAAG AATTTGACCG 
GGCTATGCAG 2160 

TATCTTGAAG ATAAATATGA GTTTATGACT TCAGAACACC AGTTCATATC 
ACGAAAGGAT 2220 

GAAGGAGATA GGATGATTGT ATTTGAAAAA GGAAACCTAG TTTTTGTCTT 
TAATTTTCAC 2280 

TGGACAAAAA GCTATTCAGA CTATCGCATA GGCTGCCTGA AGCCTGGAAA 
ATACAAGGTT 2340 

GCCTTGGACT CAGATGATCC ACTTTTTGGT GGCTTCGGGA GAATTGATCA 
TAATGCCGAA 2400 

TATTTCACCT TTGAAGGATG GTATGATGAT CGTCCTCGTT CAATTATGGT 
GTATGCACCT 2460 

TGTAGAACAG CAGTGGTCTA TGCACTAGTA GACAAAGAAG 
AAGAAGAAGA AGAAGAAGAA 2520 

GAAGAAGTAG CAGTAGTAGA AGAAGTAGTA GTAGAAGAAG 
AATGAACGAA CTTGTG 2576 

(2) INFORMATION FOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2529 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 



GGATGCTAAT GTTTCTGTAT TCTTGAAAAA GCACTCTCTT TCACGGAAGA 
TCTTGGCTGA 60 

AAAGTCTTCT TACAATTCCG AATCCCGACC TTCTACAGTT GCAGCATCGG 
GGAAAGTCCT 120 

TGTGCCTGGA AYCCAGAGTG ATAGCTCCTC ATCCTCAACA GACCAATTTG 
AGTTCACTGA 180 

GACATCTCCA GAAAATTCCC CAGCATCAAC TGATGTAGAT AGTTCAACAA 
TGGAACACGC 240 

TAGCCAGATT AAAACTGAGA ACGATGACGT TGAGCCGTCA AGTGATCTTA 
CAGGAAGTGT 300 

TGAAGAGCTG GATTTTGCTT CATCACTACA ACTACAAGAA GGTGGTAAAC 
TGGAGGAGTC 360 

TAAAACATTA AATACTTCTG AAGAGACAAT TATTGATGAA TCTGATAGGA 
TCAGAGAGAG 420 

GGGCATCCCT CCACCTGGAC TTGGTCAGAA GATTTATGAA 
ATAGACCCCC TTTTGACAAA 480 

CTATCGTCAA CACCTTGATT ACAGGTATTC ACAGTACAAG AAACTGAGGG 
AGGCAATTGA 540 

CAAGTATGAG GGTGGTTTGG AAGC I I I I I C TCGTGGTTAT GAAAAAATGG 
GTTTCACTCG 600 

TAGTGCTACA GGTATCACTT ACCGTGAGTG GGCTCCTGGT 
GCCCAGTCAG CTGCCCTCAT 660 

TGGAGATTTC AACAATTGGG ACGCAAATGC TGACATTATG ACTCGGAATG 
AATTTGGTGT 720 

CTGGGAGATT TTTCTGCCAA ATAATGTGGA TGGTTCTCCT GCAATTCCTC 
ATGGGTCCAG 780 

AGTGAAGATA CGYATGGACA CTCCATCAGG TGTTAAGGAT TCCATTCCTG 
CTTGGATCAA 840 

CTACTCTTTA CAGCTTCCTG ATGAAATTCC ATATAATGGA ATATATTATG 
ATCCACCCGA 900 



AGAGGAGAGG TATRTCTTCC AACACCCACG GCCAAAGAAA 
CCAAAGTCGC TGAGAATATA 960 



TGAATCTCAT ATTGGAATGA GTAGTCCGGA GCCTAAAATT AACTCATACG 
TGAATTTTAG 1020 

AGATGAAGTT CTTCCTCGCA TAAAAAASCT TGGGTACAAT GCGGTGCAAA 
TTATGGCTAT 1080 

TCAAGAGCAT TCTTATTATG CTAGTTTTGG TTATCATGTC ACAAATTTTT 
TTGCACCAAG 1140 

CAGCCGTTTT GGAACGCCCG ACGACCTTAA GTCTTTGATT GATAAAGCTC 
ATGAGCTAGG 1200 

AATTGTTGTT CTCATGGACA TTGTTCACAG CCATGCATCA AATAATACTT 
TAGATGGACT 1260 

GAACATGTTT GACGGCACAG ATAGTTGTTA CTTTCACTCT GGAGCTCGTG 
GTTATCATTG 1320 

GATGTGGGAT TCCCGCCTCT TTAACTATGG AAACTGGGAG GTACTTAGGT 
ATCTTCTCTC 1380 

AAATGCGAGA TGGTGGTTGG ATGAGTTCAA ATTTGATGGA TTTAGATTTG 
ATGGTGTGAC 1440 

ATCAATGATG TATACTCACC ACGGATTATC GGTGGGATTC ACTGGGAACT 
ACGAGGAATA 1500 

CTTTGGACTC GCAACTGATG TGGATGCTGT TGTGTATCTG ATGCTGGTCA 
ACGATCTTAT 1560 

TCACGGGCTT TTCCCAGATG CAATTACCAT TGGTGAAGAT GTTAGCGGAA 
TGCCGACATT 1620 

TTGTATTCCC GTTCAAGATG GGGGTGTTGG CTTTGACTAT CGGCTGCATA 
TGGCAATTGC 1680 

TGATAAATGG ATTGAGTTGC TCAAGAAACG GGATGAGGAT 
TGGAGAGTGG GTGATATTGT 1740 

TCATACACTG ACAAATAGAA GATGGTCGGA AAAGTGTGTT TCATMCGCTG 
AAAGTCATGA 1800 



TCAAGCTCTA GTCGGTGATA AAACTATAGC ATYCTGGCTG ATGGACAAGG 
ATATGTATGA 1860 



TTTTATGGCT CTGGATAGAC CGYCAACAYC ATTAATAGAT CGTGGGATAG 
CATTGCACAA 1920 

GATGATTAGG CTTGTAACTA TGGGATTAGG AGGAGAAGGG TACCTAAATT 
TCATGGGAAA 1980 

TGAATTCGGC CACCCTGAGT GGATTGATTT CCCTAGGGCT 
GARCAACACC TCTCTGATGG 2040 

CTCAGTAATT CCCGGAAACC AATTCAGTTA TGATAAATGC AGACGGAGAT 
TTGACCTGGG 2100 

AGATGCAGAA TATTTAAGAT ACCATGGGTT GCAAGAATTT GACCGGGCTA 
TGCAGTATCT 2160 

TGAAGATAAA TATGAGTTTA TGACTTCAGA ACACCAGTTC ATATCACGAA 
AGGATGAAGG 2220 

AGATAGGATG ATTGTATTTG AAA RAG G AAA CCTAGTTTTT GTCTTTAATT 
TTCACTGGAC 2280 

AAATAGCTAT TCAGACTATC GCATAGGCTG CCTGAAGCCT GGAAAATACA 
AGGTTGGCTT 2340 

GGACTCAGAT GATCCACTTT TTGGTGGCTT CGGGAGAATT GATCATAATG 
CCGAATATTT 2400 

CACCTCTGAA GGATCGTATG ATGATCGTCC TCGTTCAATT ATGGTGTATG 
CACCTAGTAG 2460 

AACAGCAGTG GTCTATGCAC TAGTAGACAA ANTAGAAGNA 
GAAGAAGAAG AAGAANCCGN 2520 

NGAAGAATT 2529 

(2) INFORMATION FOR SEQ ID NO: 18: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3231 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18: 

GATTTAATAC GACTCACTAT AGGGATTTTT TTTTTTTTTT TTTTAAAAAC 
CTCCTCCACT 60 

CAGTCTTGGG ATCTCTCTCT CTCTTCACGC TTCTCTTGGG GCCTTGAACT 
CAGCAATTTG 120 

ACACTCAGTT AGTTACACTC CTATCACTCA TCAGATCTCT Al I I I I ICTC 
TTAATTCCAA 180 

CCAAGGAATG AATTAAAAGA TTAGATTTGA AGGAGAGAAG AAGAAAGATG 
GTGTATACAC 240 

TCTCTGGAGT TCGTTTTCCT ACTGTTCCAT CAGTGTACAA ATCTAATGGA 
TTCAGCAGTA 300 

ATGGTGATCG GAGGAATGCT AATGTTTCTG TATTCTTGAA AAAGCACTCT 
CTTTCACGGA 360 

AGATCTTGGC TGAAAAGTCT TCTTACGATT CCGAATCCCG ACCTTCTACA 
GTTGCAGCAT 420 

CGGGGAAAGT CCTTGTACCT GGAATCCAGA GTGATAGCTC 
CTCATCCTCA ACAGACCAAT 480 

TTGAGTTCAC TGAGACAGCT CCAGAAAATT CCCCAGCATC AACTGATGTG 
GATAGTTCAA 540 

CAATGGAACA CGCTAGCCAG ATTAAAACTG AGAACGATGA 
CGTTGAGCCG TCAAGTGATC 600 

TTACAGGAAG TGTTGAAGAG TTGGATTTTG CTTCATCACT ACAACTACAA 
GAAGGTGGTA 660 

AACTGGAGGA GTCTAAAACA TTAAATACTT CTGAAGAGAC AATTATTGAT 
GAATCTGATA 720 

GGATCAGAGA GAGGGGCATC CCTCCACCTG GACTTGGTCA 
GAAGATTTAT GAAATAGACC 780 



CCCTTTTGAC AAACTATCGT CAACACCTTG ATTACAGGTA TTCACAGTAC 
AAGAAAATGA 840 



GGGAGGCAAT TGACAAGTAT GAGGGTGGTT TGGAAGCTTT 
TTCTCGTGGT TATGAAAAAA 900 

TGGGTTTCAC TCGTAGTGCT ACAGGTATCA CTTACCGTGA GTGGGCTCCT 
GGTGCCCAGT 960 

CAGCTGCTCT CATTGGAGAT TTCAACAATT GGGACGCAAA TGCTGACATT 
ATGACTCGGA 1020 

ATGAATTTGG TGTCTGGGAG ATTTTTCTGC CAAATAATGT GGATGGTTCT 
CCTGCAATTC 1080 

CTCATGGGTC CAGAGTGAAG ATACGCATGG ACACTTCATC 
AGGTGTTAAG GATTCCATTC 1 1 40 

CTGCTTGGAT CAACTACTCT TTACAGCTTC CTGATGAAAT TCCATATAAT 
GGAATATATT 1200 

ATGATCCACC CGAAGAGGAG AGGTATGTCT TCCAACACCC 
ACGGCCAAAG AAACCAAAGT 1260 

CGCTGAGAAT ATATGAATCT CATATTGGAA TGAGTAGTCC GGAGCCTAAA 
ATTAACTCAT 1320 

ACGTGAATTT TAGAGATGAA GTTCTTCCTC GCATAAAAAA CCTTGGGTAC 
AATGCGGTGC 1380 

AAATTATGGC TATTCAAGAG CATTCTTATT ATGCTAGTTT TGGTTATCAT 
GTCACAAATT 1440 

TTTTTGCACC AAGCAGCCGT TTTGGAACGC CCGACGACCT TAAGTCTTTG 
ATTGATAAAG 1500 

CTCATGAGCT AGGAATTGTT GTTCTCATGG ACATTGTTCA CAGCCATGCA 
TCAAATAATA 1560 

CTTTAGATGG ACTGAACATG TTTGACGGCA CAGATAGTTG TTACTTTCAC 
TCTGGAGCTC 1620 

GTGGTTATCA TTGGATGTGG GATTCCCGCC TCTTTAACTA TGGAAACTGG 
GAGGTACTTA 1680 



GGTATCTTCT CTCAAATGCG AGATGGTGGT TGGATGAGTG CAAATTTGRT 
GGATTTAGAT 1740 

TTGATGGTGT GACATCAATG ATGTATACTC ACCACGGATT ATCGGTGGGA 
TTCACTGGGA 1800 

ACTACGAGGA ATACTTTGGA CTCGCAACTG ATGTRGATGC TGCCGTGTAT 
CTGATGCTGG 1860 

CCAACGATCT TATTCATGGG CTTTTCCCAG ATGCAATTAC CATTGGTGAA 
GATGTTAGCG 1920 

GAATGCCGAC ATTTTGTATT CCCGTTCAAG ATGGGGGTGT TGGCTTTGAC 
TATCGGCTGC 1980 

ATATGGCAAT TGCTGATAAA TGGATTGAGT TGCTCAAGAA ACGGGATGAG 
GATTGGAGAG 2040 

TGGGTGATAT TGTTCATACA CTGACAAATA GAAGATGGTC GGAAAAGTGT 
GTTTCATACG 2100 

CTGAAAGTCA TGATCAAGCT CTAGTCGGTG ATAAAACTAT AGCATTCTGG 
CTGATGGACA 2160 

AGGATATGTA TGATTTTATG GCTTTGGATA GACCGTCAAC ATCATTAATA 
GATCGTGGGA 2220 

TAGCATTGCA CAAGATGATT AGGCTTGTAA CTATGGGATT AGGAGGAGAA 
GGGTACCTAA 2280 

ATTTCATGGG AAATGAATTC GGCCACCCTG AGTGGATTGA TTTCCCTAGG 
GCTGAACAAC 2340 

ACCTCTCTGA TGGCTCAGTA ATTCCCGGAA ACCAATTCAG TTATGATAAA 
TGCAGACGGA 2400 

GATTTGACCT GGGAGATGCA GAATATTTAA GATACCGTGG GTTGCAAGAA 
TTTGACCGGG 2460 

CTATGCAGTA TCTTGAAGAT AAATATGAGT TTATGACTTC AGAACACCAG 
TTCATATCAC 2520 

GAAAGGATGA AGGAGATAGG ATGATTGTAT TTGAAAAAGG AAACCTAGTT 
TTTGTCTTTA 2580 



ATTTTCACTG GACAAAAAGC TATTCAGACT ATCGCATAGG CTGGCTGAAG 
CCTGGAAAAT 2640 

ACAAGGTTGC CTTGGACTCA GATGATCCAC I I I I IGGTGG 
CTTCGGGAGA ATTGATCATA 2700 

ATGCCGAATG TTTCACCTTT GAAGGATGGT ATGATGATCG TCCTCGTTCA 
ATTATGGTGT 2760 

ATGCACCTAG TAGAACAGCA GTGGTCTATG CACTAGTAGA CAAAGAAGAA 
GAAGAAGAAG 2820 

AAGTAGCAGT AGTAGAAGAA GTAGTAGTAG AAGAAGAATG AACGAACTTG 
TGATCGCGTT 2880 

GAAAGATTTG AACGCTACAT AGAGCTTCTT GACGTATCTG GCAATATTGC 
ATCAGTCTTG 2940 

GCGGAATTTC ATGTGACAAA AGGTTTGCAA TTCTTTCCAC TATTAGTAGT 
GCAACGATAT 3000 

ACGCAGAGAT GAAGTGCTGA ACAAACATAT GTAAAATCGA TGAATTTATG 
TCGAATGCTG 3060 

GGACGGGCTT CAGCAGGTTT TGCTTAGTGA GTTCTGTAAA TTGTCATCTC 
TTTANATGTA 3120 

CAGCCCACTA GAAATCAATT ATGTGAGACC TAAAAAACAA TAACCATAAA 
ATGGAAATAG 3180 

TGCTGATCTA ATGATGTTTT AANCCNNNNA AAAAAAAAAA AAAAACTCGA 
G 3231 

(2) INFORMATION FOR SEQ ID NO: 19: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2578 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19: 



TCATTAAAGA GGAGAAATTA ACTATGAGAG GATCTCACCA TCACCATCAC 
CATGGGATCT 60 

TGGCTGAAAA GTCTTCTTAC AATTCCGAAT TCCGACCTTC TACAGTTGCA 
GCATCGGGGA 120 

AAGTCCTTGT GCCTGGAACC CAGAGTGATA GCTCCTCATC CTCAACAAAC 
CAATTTGAGT 180 

TCACTGAGAC ATCTCCAGAA AATTCCCCAG CATCAACTGA TGTAGATAGT 
TCAACAATGG 240 

AACACGCTAG CCAGATTAAA ACTGAGAACG ATGACGTTGA 
GCCGTCAAGT GATCTTACAG 300 

GAAGTGTTGA AGAGCTGGAT TTTGCTTCAT CACTACAACT ACAAGAAGGT 
GGTAAACTGG 360 

AGGAGTCTAA AACATTAAAT ACTTCTGAAG AGACAATTAT TGATGAATCT 
GATAGGATCA 420 

GAGAGAGGGG CATCCCTCCA CCTGGACTTG GTCAGAAGAT 
TTATGAAATA GACCCCCTTT 480 

TGACAAACTA TCGTCAACAC CTTGATTACA GGTATTCACA GTACAAGAAA 
CTGAGGGAGG 540 

CAATTGACAA GTATGAGGGT GGTTTGGAAG CTTTTTCTCG TGGTTATGAA 
AAAATGGGTT 600 

TCACTCGTAG TGCTACAGGT ATCACTTACC GTGAGTGGGC 
TCCTGGTGCC CAGTCAGCTG 660 

CCCTCATTGG AGATTTCAAC AATTGGGACG CAAATGCTGA CATTATGACT 
CGGAATGAAT 720 

TTGGTGTCTG GGAGATTTTT CTGCCAAATA ATGTGGATGG TTCTCCTGCA 
ATTCCTCATG 780 

GGTCCAGAGT GAAGATACGT ATGGACACTC CATCAGGTGT 
TAAGGATTCC ATTCCTGCTT 840 



GGATCAACTA CTCTTCACAG CTTCCTGATG AAATTCCATA TAATGGAATA 
TATTATGATC 900 



CACCCGAAGA GGAGAGGTAT ATCTTCCAAC ACCCACGGCC 
AAAGAAACCA AAGTCGCTGA 960 



GAATATATGA ATCTCATATT GGAATGAGTA GTCCGGAGCC TAAAATTAAC 
TCATACGTGA 1020 

ATTTTAGAGA TGAAGTTCTT CCTCGCATAA AAAAGCTTGG GTACAATGCG 
GTGCAAATTA 1080 

TGGCTATTCA AGAGCATTCT TATTATGCTA GTTTTGGTTA TCATGTCACA 
AATTTTTTTG 1140 

CACCAAGCAG CCGTTTTGGA ACGCCCGACG ACCTTAAGTC TTTGATTGAT 
AAAGCTCATG 1200 

AGCTAGGAAT TGTTGTTCTC ATGGACATTG TTCACAGCCA TGCATCAAAT 
AATACTTTAG 1260 

ATGGACTGAA CATGTTTGAC GGCACCGATA GTTGTTACTT TCACTCTGGA 
GCTCGTGGTT 1320 

ATCATTGGAT GTGGGATTCC CGCCTTTTTA ACTATGGAAA CTGGGAGGTA 
CTTAGGTATC 1380 

TTCTCTCAAA TGCGAGATGG TGGTTGGATG AGTTCAAATT TGATGGATTT 
AGATTTGATG 1440 

GTGTGACATC AATGATGTAT ACTCACCACG GATTATCGGT GGGATTCACT 
GGGAACTACG 1500 

AGGAATACTT TGGACTCGCA ACTGATGTGG ATGCTGTTGT GTATCTGATG 
CTGGTCAACG 1560 

ATCTTATTCA TGGGCTTTTC CCAGATGCAA TTACCATTGG TGAAGATGTT 
AGCGGAATGC 1620 

CGACATTTTG TATTCCCGTT CAAGATGGGG GTGTTGGCTT TGACTATCGG 
CTGCATATGG 1680 

CAATTGCTGA TAAATGGATT GAGTTGCTCA AGAAACGGGA TGAGGATTGG 
AGAGTGGGTG 1740 



ATATTGTTCA TACACTGACA AATAGAAGAT GGTCGGAAAA GTGTGTTTCA 
TACGCTGAAA 1800 



GTCATGATCA AGCTCTAGTC GGTGATAAAA CTATAGCATT CTGGCTGATG 
GACAAGGATA 1860 



TGTATGATTT TATGGCTCTG GATAGACCGC CAACATCATT AATAGATCGT 
GGGATAGCAT 1920 

TGCACAAGAT GATTAGGCTT GTAACTATGG GATTAGGAGG 
AGAAGGGTAC CTAAATTTCA 1980 

TGGGAAATGA ATTCGGCCAC CCTGAGTGGA TTGATTTCCC 
TAGGGCTGAA CAACACCTCT 2040 

CTGATGACTC AGTAATTCCC GGAAACCAAT TCAGTTATGA TAAATGCAGA 
CGGAGATTTG 2100 

ACCTGGGAGA TGCAGAATAT TTAAGATACC GTGGGTTGCA AGAATTTGAC 
CGGGCTATGC 2160 

AGTATCTTGA AGATAAATAT GAGTTTATGA CTTCAGAACA CCAGTTCATA 
TCACGAAAGG 2220 

ATGAAGGAGA TAGGATGATT GTATTTGAAA AAGGAAACCT AGTTTTTGTC 
TTTAATTTTC 2280 

ACTGGACAAA AAGCTATTCA GACTATCGCA TAGGCTGCCT 
GAAGCCTGGA AAATACAAGG 2340 

TTGCCTTGGA CTCAGATGAT CCACTTTTTG GTGGCTTCGG GAGAATTGAT 
CATAATGCCG 2400 

AATATTTCAC CTTTGAAGGA TGGTATGATG ATCGTCCTCG TTCAATTATG 
GTGTATGCAC 2460 

CTTGTAGAAC AGCAGTGGTC TATGCACTAG TAGACAAAGA AGAAGAAGAA 
GAAGAAGAAG 2520 

AAGAAGAAGT AGCAGTAGTA GAAGAAGTAG TAGTAGAAGA 
AGAATGAACG AACTTGTG 2578 

(2) INFORMATION FOR SEQ ID NO: 20: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



hi 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 20: 
AATTTYATGG GNAAYGARTT YGG 
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SEQUENCE LISTING 



(1) GENERAL INFORMATION: 

(i) APPLICANTS: 

(A) Cooke, David 

(B) Debet, Martine 

(C) Gidley, Micheal John 

(D) Jobling, Stephen Alan 

(E) Safford, Richard 

(F) Sidebottom, Christopher Michael 

(G) Westcott, Roger John 

(ii) TITLE OF INVENTION: Improvements in or Relating to 
Starch Composition 

(iii) NUMBER OF SEQUENCES: 20 

(iv) CORRESPONDANCE ADDRESS: 

(A) NAME: National Starch and Chemical Company 

(B) STREET: 10 Finderne Avenue, P.O. Box 6500 

(C) CITY: Bridgewater 

(D) STATE: New Jersey 

(E) COUNTRY: United States 

(F) ZIP CODE: 08807-500 

(v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: 1MB 1.44 MB High Density Diskette 

(B) COMPUTER: COMPAQ Deskpro 590 (IBM PC compatible) 

(C) OPERATING SYSTEM: WINDOWS 95 

(D) SOFTWARE: Word 7.0 for Windows 

(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: 

(B) FILING DATE: Filed concurrently herewith 

(C) CLASSIFICATION 

(vii) PRIOR APPICATION DATA 

(A) APPLICATION NUMBER: PCT/GB96/01075 

<B) INTERNATIONAL FILING DATE: May 3, 1996 

(C) PRIORITY DATE: May 5, 1995 

(viii) ATTORNEY INFORMATION 

(A) NAME: Karen G. Kaiser 

(B) REGISTRATION NO: 33,506 

(C) DOCKET NUMBER: 1627 

(ix) TELECOMMUNICATION INFORMATION 

(A) TELEPHONE: (908) 575-6152 

(B) FACSIMILE: (908) 707-3706 

(C) E-MAIL: KAREN.KAISER@NSTARCH.COM 

(2) INFORMATION FOR SEQ ID NO: 1: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 57 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 1: 
AAGGATCCGT CGACATCGAT AATACGACTC ACTATAGGGA TTTTTTTTTT TTTTTTT 



(2) INFORMATION FOR SEQ ID NO: 2: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 17 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO 
AAGGATCCGT CGACATC 



(2) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 17 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO 
GACATCGATA ATACGAC 



(2) INFORMATION FOR SEQ ID NO: 4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 
CATCCAACCA CCATCTCGCA 



(2) INFORMATION FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 
TTGAGAGAAG ATAC CTAAGT 



(2) INFORMATION FOR SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 
ATGTTCAGTC CATCTAAAGT 
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(2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7 
AGAACAACAA TTCCTAGCTC 



(2) INFORMATION FOR SEQ ID NO: 8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8 
GGGGCCTTGA ACTCAGCAAT 



(2) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 
CGTCCCAGCA TTCGACATAA 



(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 26 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10 
CTTGGATCCT TGAACTCAGC AATTTG 



(2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 29 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11 
TAACTCGAGC AACGCGATCA CAAGTTCGT 
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(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3003 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 
GATGGGGCCT TGAACTCAGC AATTTGACAC TCAGTTAGTT ACACTGCCAT CACTTATCAG 
ATCTCTATTT TTTCTCTTAA TTCCAACCAA GGAATGAATA AAAAGATAGA TTTGTAAAAA 
CCCTAAGGAG AGAAGAAGAA AGATGGTGTA TACACTCTCT GGAGTTCGTT TTCCTACTGT 
TCCATCAGTG TACAAATCTA ATGGATTCAG CAGTAATGGT GATCGGAGGA ATGCTAATAT 
TTCTGTATTC TTGAAAAAAC ACTCTCTTTC ACGGAAGATC TTGGCTGAAA AGTCTTCTTA 
CAATTCCGAA TCCCGACCTT CTACAATTGC AGCATCGGGG AAAGTCCTTG TGCCTGGAAT 
CCAGAGTGAT AGCTCCTCAT CCTCAACAGA TCAATTTGAG TTCGCTGAGA CATCTCCAGA 
AAATTCCCCA GCATCAACTG ATGTAGATAG TTCAACAATG GAACACGCTA GCCAGATTAA 
AACTGAGAAC GATGACGTTG AGCCGTCAAG TGATCTTACA GGAAGTGTTG AAGAGCTGGA 
TTTTGCTTCA TCACTACAAC TACAAGAAGG TGGTAAACTG GAGGAGTCTA AAACATTAAA 
TACTTCTGAA GAGACAATTA TTGATGAATC TGATAGGATC AGAGAGAGGG GCATCCCTCC 
ACCTGGACTT GGTCAGAAGA TTTATGAAAT AGACCCCCTT TTGACAAACT ATCGTCAACA 
CCTTGATTAC AGGTATTCAC AGTACAAGAA ACTGAGGGAG GCAATTGACA AGTATGAGGG 
TGGTTTGGAA GCTTTTTCTC GTGGTTATGA AAGAATGGGT TTCACTCGTA GTGCTACAGG 
TATCACTTAC CGTGAGTGGG CTCCTGGTGC CCAGTCAGCT GCCCTCATTG GGGATTTCAA 
CAATTGGGAC GCAAATGCTG ACTTTATGAC TCGGAATGAA TTTGGTGTCT GAGAGATTTT 
TCTGCCAAAT AATGTGGATG GTTCTCCTGC AATTCCTCAT GGGTCCAGAG TGAAGATACG 
TATGGACACT CCATCAGGTG TTAAGGATTC CATTCCTGCT TGGATCAACT ACTCTTTACA 
GCTTCCTGAT GAAATTCCAT ATAATGGAAT ATATTATGAT CCACCCGAAG AGGAGAGGTA 
TATCTTCCAA CACCCACGGC CAAAGAAACC AAAGTCGGTG AGAATATATG AATCTCATAT 
TGGAATGAGT AGTCCGGAGC CTAAAATTAA CTCATACGTG AATTTTAGAG ATGAAGTTCT 
TCCTCGCATA AAAAAAGCTT GGGTACAATG CGGTGCAAAT TATGGCTATT CAAGAGCATT 
CTTATTATGC TAGTTTTGGT TATCATGTCA CAAATTTTTT TGCACCAAGC AGCCGTTTTG 
GAACGCCCGA CGACCTTAAG TCTTTGATTG ATAAAGCTCA TGAGCTAGGA ATTGTTGTTC 
TCATGGACAT TGTTCACAGC CATGCATCAA ATAATACTTT AGATGGACTG AACATGTTTG 
ACGGCACAGA TAGTTGTTAC TTTCACTCTG GAGCTCGTGG TTATCATTGG ATGTGGGATT 
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TCCGCCTCTT TAACTATGGA AACTGGGAGG TACTTAGGTA TCTTCTCTCA AATGCGAGAT 1620 

GGTGGTTGGA TGAGTTCAAA TTTGATGGAT TTAGATTTGA TGGTGTGACA TCAATGATGT 1680 

GTACTCACCA CGGATTATCG GTGGGATTCA CTGGGAACTA CGAGGAATAC TTTGGACTCG 1740 

CAACTGATGT GGATGCTGTT GTGTATCTGA TGCTGGTCAA CGATCTTATT CATGGGCTTT 1800 

TCCCAGATGC AATTACCATT GGTGAAGATG TTAGCGGAAT GCCGACATTT TGTGTTCCCG 1860 

TTCAAGATGG GGGTGTTGGC TTTGACTATC GGCTGCATAT GGCAATTGCT GATAAATGGA 1920 

TTGAGTTGCT CAAGAAACGG GATGAGGATT GGAGAGTGGG TGATATTGTT CATACACTGA 1980 

CAAATAGAAG ATGGTCGGAA AAGTGTGTTT CATACGCTGA AAGTCATGAT CAAGCTCTAG 2040 

TGGGTGATAA AACTATAGCA TTCTGGCTGA TGGACAAGGA TATGTATGAT TTTATGGCTC 2100 

TGGATAGACC GTCAACATCA TTAATAGATC GTGGGATAGC ATTACACAAG ATGATTAGGC 2160 

TTGTAACTAT GGGATTAGGA GGAGAAGGGT ACCTAAATTT CATGGGAAAT GAATTCGGCC 2220 

ACCCTGAGTG GATTGATTTC CCTAGGGCTG AACAACACCT CTCTGATGGC TCAGTAATTC 2280 

JI CCAGAAACCA ATTCAGTTAT GATAAATGCA GACGGAGATT TGACCTGGGA GATGCAGAAT 2340 

O ATTTAAGATA CCGTGGGTTG CAAGAATTTG ACCGGGCTAT GCAGTATCTT GAAGATAAAT 2400 

y| 

m ATGAGTTTAT GACTTCAGAA CACCAGTTCA TATCACGAAA GGATGAAGGA GATAGGATGA 2460 

J TTGTATTTGA AAAAGGAAAC CTAGTTTTTG TCTTTAATTT TCACTGGACA AAAGGCTATT 2520 

CAGACTATCG CATAGGCTGC CTGAAGCCTG GAAAATACAA GGTTGCCTTG GACTCAGATG 2580 

ATCCACTTTT TGGTGGCTTC GGGAGAATTG ATCATAATGC CGAATATTTC ACCTTTGAAG 2640 

~j GATGGTATGA TGATCGTCCT CGTTCAATTA TGGTGTATGC ACCTAGTAGA ACAGCAGTGG 2700 

TCTATGCACT AGTAGACAAA GAAGAAGAAG AAGAAGAAGA AGTAGCAGTA GTAGAAGAAG 2760 

TAGTAGTAGA AGAAGAATGA ACGAACTTGT GATCGCGTTG AAAGATTTGA ACGCCACATA 2820 

GAGCTTCTTG ACGTATCTGG CAATATTGCA TTAGTCTTGG CGGAATTTCA TGTGACAACA 2880 

GGTTTGCAAT TCTTTCCACT ATTAGTAGTG CAACGATATA CGCAGAGATG AAGTGCTGAA 2940 

CAAAAACATA TGTAAAATCG ATGAATTTAT GTCGAATGCT GGGACGATCG AATTCCTGCA 3000 

GCC 3003 
(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2975 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 

TTGATGGGCC TTGAACTCAG CAATTTGACA CTCAGTTAGT TACACTCCTA TCACTTATCA 60 

GATCTCTATT TTTTCTCTTA ATTCCAACCA GGGGAATGAA TAAAAGGATA GATTTGTAAA 120 
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AACCCTAAGG AGAGAAGAAG AAAGATGGTG TATATACTCT CTGGAGTTCG TTTTCCTACT 180 

GTTCCATCAG TGTACAAATC TAATGGATTC AGCAGTAATG GTGATCGGAG GAATGCTAAT 240 

GTTTCTGTAT TCTTGAAAAA GCACTCTCTT TCACGGAAGA TCTTGGCTGA AAAGTCTTCT 300 

TACAATTCCG AATTCCGACC TTCTACAGTT GCAGCATCGG GGAAAGTCCT TGTGCCTGGA 360 

ACCCAGAGTG ATAGCTCCTC ATCCTCAACA GACCAATTTG AGTTCACTGA GACATCTCCA 420 

GAAAATTCCC CAGCATCAAC TGATGTAGAT AGTTCAACAA TGGAACACGC TAGCCAGATT 480 

AAAACTGAGA ACGATGACGT TGAGCCGTCA AGTGATCTTA CAGGAAGTGT TGAAGAGCTG 540 

GATTTTGCTT CATCACTACA ACTACAAGAA GGTGGTAAAC TGGAGGAGTC TAAAACATTA 600 

AATACTTCTG AAGAGACAAT TATTGATGAA TCTGATAGGA TCAGAGAGAG GGGCATCCCT 660 

CCACCTGGAC TTGGTCAGAA GATTTATGAA ATAGACCCCC TTTTGACAAA CTATCGTCAA 720 

CACCTTGATT ACAGGTATTC ACAGTACAAG AAACTGAGGG AGGCAATTGA CAAGTATGAG 780 

GGTGGTTTGG AAGCTTTTCT CGTGGTTATG AAAAAATGGG TTTCACTCGT AGTGCTACAG 840 

GTATC AC TTA CCGTGAGTGG GCTCCTGGTG CCCAGTCAGC TGCCCTCATT GGAGATTTCA 900 

ACAATTGGGA CGCAAATGCT GACATTATGA CTCGGAATGA ATTTGGTGTC TGGGAGATTT 960 

TTCTGCCAAA TAATGTGGAT GGTTCTCCTG CAATTCCTCA TGGGTCCAGA GTGAAGATAC 1020 

GTATGGACAC TCCATCAGGT GTTAAGGATT CCATTCCTGC TTGGATCAAC TACTCTTTAC 1080 

^ AGCTTCCTGA TGAAATTCCA TATAATGGAA TATATTATGA TCCACCCGAA GAGGAGAGGT 1140 

ATATCTTCCA ACACCCACGG CCAAAGAAAC CAAAGTCGCT GAGAATATAT GAATCTCATA 1200 

TTGGAATGAG TAGTCCGGAG CCTAAAATTA ACTCATACGT GAATTTTAGA GATGAAGTTC 1260 

TTCCTCGCAT AAAAAAGCTT GGGTACAATG CGCTGCGAAT TATGGCTATT CAAGAGCATT 1320 

CTTATTATGC TAGTTTTGGT TATCATGTCA CAAATTTTTT TGCACCAAGC AGCCGTTTTG 1380 

GAACGCCCGA CGACCTTAAG TCTTCGATTG ATAAAGCTCA TGAGCTAGGA ATTGTTGTTC 1440 

TCATGGACAT CGTTCACAGC CATGCATCAA ATAATACTTT AGATGGACTG AACATGTTTG 1500 

ACGGCACCGA TAGTTGTTAC TTTCACTCTG GAGCTCGTGG TTATCATTGG ATGTGGGATT 1560 

CCGCCTCTTT AACTATGGAA ACTGGGAGGT ACTTAGGTAT CTTCTCTCAA ATGCGAGATG 1620 

GTGGTTGGAT GAGTTCAAAT TTGATGGATT TAGATTCGAT GGTGTGACAT CAATGATGTA 1680 

TACTCACCAC GGATTATCGG TGGGATTCAC TGGGAACTAC GAGGAATACT TTGGACTCGC 1740 

AACTGATGTG GATGCTGTTG TGTATCTGAT GCTGGTCAAC GATCTTATTC ATAGGCTTTT 1800 

CCCAGATGCA ATTACCATTG GTGAAGATGT TAGCGGAATG CCGACATTTT GTATTCCCGT 1860 

TCAAGATGGG GGTGTTGGCT TTGACTATCG GCTGCATATG GCAATTGCTG ATAAATGGAT 1920 

TGAGTTGCTC AAGAAACGGG ATGAGGATTG GAGAGTGGGT GATATTGTTC ATACACTGAC 1980 



til 




GGT GAT CGG AGG AAT GCT AAT GTT TCT GTA TTC TTG AAA AAG CAC TCT 
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270 275 280 

CCT GCA ATT CCT CAT GGG TCC AGA GTG AAG ATA CGT ATG GAC ACT CCA 1035 
Pro Ala lie Pro His Gly Ser Arg Val Lys He Arg Met Asp Thr Pro 
285 290 295 

TCA GGT GTT AAG GAT TCC ATT CCT GCT TGG ATC AAC TAC TCT TTA CAG 1083 
Ser Gly Val Lys Asp Ser He Pro Ala Trp He Asn Tyr Ser Leu Gin 
300 305 310 

CTT CCT GAT GAA ATT CCA TAT AAT GGA ATA CAT TAT GAT CCA CCC GAA 1131 
Leu Pro Asp Glu He Pro Tyr Asn Gly He His Tyr Asp Pro Pro Glu 
315 320 325 

GAG GAG AGG TAT ATC TTC CAA CAC CCA CGG CCA AAG AAA CCA AAG TCG 1179 
Glu Glu Arg Tyr He Phe Gin His Pro Arg Pro Lys Lys Pro Lvs Ser 
330 335 340 345 

CTG AGA ATA TAT GAA TCT CAT ATT GGA ATG AGT AGT CCG GAG CCT AAA 1227 
Leu Arg He Tyr Glu Ser His He Gly Met Ser Ser Pro Glu Pro Lys 
350 355 360 

ATT AAC TCA TAC GTG AAT TTT AGA GAT GAA GTT CTT CCT CGC ATA AAA 1275 
jj, He Asn Ser Tyr Val Asn Phe Arg Asp Glu Val Leu Pro Arg He Lys 

U 365 370 375 

=» 

rf AAG CTT GGG TAC AAT GCG CTG CAA ATT ATG GCT ATT CAA GAG CAT TCT 1323 

Uj Lys Leu Gly Tyr Asn Ala Leu Gin He Met Ala He Gin Glu His Ser 

yi 380 385 390 

||l TAT TAC GCT AGT TTT GGT TAT CAT GTC ACA AAT TTT TTT GCA CCA AGC 1371 

Tyr Tyr Ala Ser Phe Gly Tyr His Val Thr Asn Phe Phe Ala Pro Ser 
;~ 395 400 405 

Q AGC CGT TTT GGA ACG CCC GAC GAC CTT AAG TCT TTG ATT GAT AAA GCT 1419 

Ser Arg Phe Gly Thr Pro Asp Asp Leu Lys Ser Leu He Asp Lys Ala 
!U . 410 415 420 425 

CAT GAG CTA GGA ATT GTT GTT CTC ATG GAC ATT GTT CAC AGC CAT GCA 1467 
His Glu Leu Gly He Val Val Leu Met Asp He Val His Ser His Ala 
430 435 440 

TCA AAT AAT ACT TTA GAT GGA CTG AAC ATG TTT GAC TGC ACC GAT AGT 1515 
Ser Asn Asn Thr Leu Asp Gly Leu Asn Met Phe Asp Cys Thr Asp Ser 
445 450 455 

TGT TAC TTT CAC TCT GGA GCT CGT GGT TAT CAT TGG ATG TGG GAT TCC 1563 
Cys Tyr Phe His Ser Gly Ala Arg Gly Tyr His Trp Met Trp Asp Ser 
460 465 470 

CGC CTC TTT AAC TAT GGA AAC TGG GAG GTA CTT AGG TAT CTT CTC TCA 1611 
Arg Leu Phe Asn Tyr Gly Asn Trp Glu Val Leu Arg Tyr Leu Leu Ser 
475 480 485 

AAT GCG AGA TGG TGG TTG GAT GCG TTC AAA TTT GAT GGA TTT AGA TTT 1659 
Asn Ala Arg Trp Trp Leu Asp Ala Phe Lys Phe Asp Gly Phe Arg Phe 
490 495 500 505 

GAT GGT GTG ACA TCA ATG ATG TAT ATT CAC CAC GGA TTA TCG GTG GGA 1707 
Asp Gly Val Thr Ser Met Met Tyr He His His Gly Leu Ser Val Gly 
510 515 520 

TTC ACT GGG AAC TAC GAG GAA TAC TTT GGA CTC GCA ACT GAT GTG GAT 1755 
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Phe Thr Gly Asn Tyr Glu Glu Tyr Phe Gly Leu Ala Thr Asp Val Asp 
525 530 535 

GCT GTT GTG TAT CTG ATG CTG GTC AAC GAT CTT ATT CAT GGG CTT TTC 
Ala Val Val Tyr Leu Met Leu Val Asn Asp Leu lie His Gly Leu Phe 
540 545 550 

CCA GAT GCA ATT ACC ATT GGT GAA GAT GTT AGC GGA ATG CCG ACA TTT 
Pro Asp Ala lie Thr lie Gly Glu Asp Val Ser Gly Met Pro Thr Phe 
555 560 565 

TGT ATT CCC GTC CAA GAG GGG GGT GTT GGC TTT GAC TAT CGG CTG CAT 
Cys lie Pro Val Gin Glu Gly Gly Val Gly Phe Asp Tyr Arg Leu His 
570 575 580 585 

ATG GCA ATT GCT GAT AAA CGG ATT GAG TTG CTC AAG AAA CGG GAT GAG 
Met Ala lie Ala Asp Lys Arg lie Glu Leu Leu Lys Lys Arg Asp Glu 
590 595 600 

GAT TGG AGA GTG GGT GAT ATT GTT CAT ACA CTG ACA AAT AGA AGA TGG 
Asp Trp Arg Val Gly Asp lie Val His Thr Leu Thr Asn Arg Arg Trp 
605 610 615 

TCG GAA AAG TGT GTT TCA TAC GCT GAA AGT CAT GAT CAA GCT CTA GTC 
Ser Glu Lys Cys Val Ser Tyr Ala Glu Ser His Asp Gin Ala Leu Val 
620 625 630 

GGT GAT AAA ACT ATA GCA TTC TGG CTG ATG GAC AAG GAT ATG TAT GAT 
Gly Asp Lys Thr lie Ala Phe Trp Leu Met Asp Lys Asp Met Tyr Asp 
635 640 645 

TTT ATG GCT CTG GAT AGA CCG TCA ACA TCA TTA ATA GAT CGT GGG ATA 
Phe Met Ala Leu Asp Arg Pro Ser Thr Ser Leu lie Asp Arg Gly lie 
650 655 660 665 

GCA TTG CAC AAG ATG ATT AGG CTT GTA ACT ATG GGA TTA GGA GGA GAA 
Ala Leu His Lys Met lie Arg Leu Val Thr Met Gly Leu Gly Gly Glu 
670 675 680 

GGG TAC CTA AAT TTC ATG GGA AAT GAA TTC GGC CAC CCT GAG TGG ATT 
Gly Tyr Leu Asn Phe Met Gly Asn Glu Phe Gly His Pro Glu Trp lie 
685 690 695 

GAT TTC CCT AGG GCT GAA CAA CAC CTC TCT GAT GGC TCA GTA ATC CCC 
Asp Phe Pro Arg Ala Glu Gin His Leu Ser Asp Gly Ser Val lie Pro 
700 705 710 

GGA AAC CAA TTC AGT TAT GAT AAA TGC AGA CGG AGA TTT GAC CTG GGA 
Gly Asn Gin Phe Ser Tyr Asp Lys Cys Arg Arg Arg Phe Asp Leu Gly 
715 720 725 

GAT GCA GAA TAT TTA AGA TAC CGT GGG TTG CAA GAA TTT GAC CGG CCT 
Asp Ala Glu Tyr Leu Arg Tyr Arg Gly Leu Gin Glu Phe Asp Arg Pro 
730 735 740 745 

ATG CAG TAT CTT GAA GAT AAA TAT GAG TTT ATG ACT TCA GAA CAC CAG 
Met Gin Tyr Leu Glu Asp Lys Tyr Glu Phe Met Thr Ser Glu His Gin 
750 755 760 

TTC ATA TCA CGA AAG GAT GAA GGA GAT AGG ATG ATT GTA TTT GAA AAA 
Phe He Ser Arg Lys Asp Glu Gly Asp Arg Met He Val Phe Glu Lys 
765 770 775 
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GGA AAC CTA GTT TTT GTC TTT AAT TTT CAC TGG ACA AAA AGC TAT TCA 2523 

Gly Asn Leu Val Phe Val Phe Asn Phe His Trp Thr Lys Ser Tyr Ser 

780 785 *" 790 

GAC TAT CGC ATA GCC TGC CTG AAG CCT GGA AAA TAC AAG GTT GCC TTG 2571 
Asp Tyr Arg He Ala Cys Leu Lys Pro Gly Lys Tyr Lys Val Ala Leu 
795 800 805 

GAC TCA GAT GAT CCA CTT TTT GGT GGC TTC GGG AGA ATT GAT CAT AAT 2619 
Asp Ser Asp Asp Pro Leu Phe Gly Gly Phe Gly Arg He Asp His Asn 
810 815 820 825 

GCC GAA TAT TTC ACC TTT GAA GGA TGG TAT GAT GAT CGT CCT CGT TCA 2667 
Ala Glu Tyr Phe Thr Phe Glu Gly Trp Tyr Asp Asp Arg Pro Arg Ser 
830 835 840 

ATT ATG GTG TAT GCA CCT TGT AAA ACA GCA GTG GTC TAT GCA CTA GTA 2715 
He Met Val Tyr Ala Pro Cys Lys Thr Ala Val Val Tyr Ala Leu Val 
845 850 855 

GAC AAA GAA GAA GAA GAA GAA GAA GAA GAA GAA GAA GAA GTA GCA GCA 2763 
Asp Lys Glu Glu Glu Glu Glu Glu Glu Glu Glu Glu Glu Val Ala Ala 
860 865 870 

GTA GAA GAA GTA GTA GTA GAA GAA GAA TGAACGAACT TGTGATCGCG 2810 
Val Glu Glu Val Val Val Glu Glu Glu 
875 880 

TTGAAAGATT TGAACGCTAC ATAGAGCTTC TTGACGTATC TGGCAATATT GCATCAGTCT 2870 

TGGCGGAATT TCATGTGACA CAAGGTTTGC AATTCTTTCC ACTATTAGTA GTGCAACGAT 2930 

ATACGCAGAG ATGAAGTGCT GAACAAACAT ATGTAAAATC GATGAATTTA TGTCGAATGC 2990 

TGGGACGATC GAATTCCTGC AGGCCGGGGG ACCCCTTAGT TCT 3033 

(2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 882 amino acids 
<B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 

Met Val Tyr Thr Leu Ser Gly Val Arg Phe Pro Thr Val Pro Ser Val 
1 5 10 15 

Tyr Lys Ser Asn Gly Phe Ser Ser Asn Gly Asp Arg Arg Asn Ala Asn 
20 25 30 

Val Ser Val Phe Leu Lys Lys His Ser Leu Ser Arg Lys He Leu Ala 
35 40 45 

Glu Lys Ser Ser Tyr Asn Ser Glu Phe Arg Pro Ser Thr Val Ala Ala 
50 55 60 

Ser Gly Lys Val Leu Val Pro Gly Thr Gin Ser Asp Ser Ser Ser Ser 
65 70 75 80 



Ser Thr Asp Gin Phe Glu Phe Thr Glu Thr Ser Pro Glu Asn Ser Pro 
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85 90 95 

Ala Ser Thr Asp Val Asp Ser Ser Thr Met Glu His Ala Ser Gin He 
100 105 HO 

Lys Thr Glu Asn Asp Asp Val Glu Pro Ser Ser Asp Leu Thr Gly Ser 
115 120 125 

Val Glu Glu Leu Asp Phe Ala Ser Ser Leu Gin Leu Gin Glu Gly Gly 
130 135 140 

Lys Leu Glu Glu Ser Lys Thr Leu Asn Thr Ser Glu Glu Thr He He 
145 150 155 160 

Asp Glu Ser Asp Arg He Arg Glu Arg Gly He Pro Pro Pro Gly Leu 
165 170 175 

Gly Gin Lys He Tyr Glu He Asp Pro Leu Leu Thr Asn Tyr Arg Gin 
180 185 190 

His Leu Asp Tyr Arg Tyr Ser Gin Tyr Lys Lys Leu Arg Glu Ala He 
195 200 205 

Asp Lys Tyr Glu Gly Gly Leu Glu Ala Phe Ser Arg Gly Tyr Glu Lys 
210 215 220 

Met Gly Phe Thr Arg Ser Ala Thr Gly He Thr Tyr Arg Glu Trp Ala 
225 230 235 240 

Leu Gly Ala Gin Ser Ala Ala Leu He Gly Asp Phe Asn Asn Trp Asp 
245 250 255 

Ala Asn Ala Asp He Met Thr Arg Asn Glu Phe Gly Val Trp Glu He 
260 265 270 

Phe Leu Pro Asn Asn Val Asp Gly Ser Pro Ala He Pro His Gly Ser 
275 280 285 

Arg Val Lys He Arg Met Asp Thr Pro Ser Gly Val Lys Asp Ser He 
290 295 300 

Pro Ala Trp He Asn Tyr Ser Leu Gin Leu Pro Asp Glu He Pro Tyr 
305 310 315 320 

Asn Gly He His Tyr Asp Pro Pro Glu Glu Glu Arg Tyr He Phe Gin 
325 330 335 

His Pro Arg Pro Lys Lys Pro Lys Ser Leu Arg He Tyr Glu Ser His 
340 345 350 

He Gly Met Ser Ser Pro Glu Pro Lys He Asn Ser Tyr Val Asn Phe 
355 360 365 

Arg Asp Glu Val Leu Pro Arg He Lys Lys Leu Gly Tyr Asn Ala Leu 
370 375 380 

Gin He Met Ala He Gin Glu His Ser Tyr Tyr Ala Ser Phe Gly Tyr 
385 390 395 400 

His Val Thr Asn Phe Phe Ala Pro Ser Ser Arg Phe Gly Thr Pro Asp 

405 410 " 415 

Asp Leu Lys Ser Leu lie Asp Lys Ala His Glu Leu Gly He Val Val 
420 425 430 
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Leu Met Asp He Val His Ser His Ala Ser Asn Asn Thr Leu Asp Gly 
435 440 445 

Leu Asn Met Phe Asp Cys Thr Asp Ser Cys Tyr Phe His Ser Gly Ala 
450 455 460 

Arg Gly Tyr His Trp Met Trp Asp Ser Arg Leu Phe Asn Tyr Gly Asn 
465 470 475 480 

Trp Glu Val Leu Arg Tyr Leu Leu Ser Asn Ala Arg Trp Trp Leu Asp 
485 490 495 

Ala Phe Lys Phe Asp Gly Phe Arg Phe Asp Gly Val Thr Ser Met Met 
500 505 510 

Tyr He His His Gly Leu Ser Val Gly Phe Thr Gly Asn Tyr Glu Glu 
515 520 525 

Tyr Phe Gly Leu Ala Thr Asp Val Asp Ala Val Val Tyr Leu Met Leu 
530 535 540 

Val Asn Asp Leu He His Gly Leu Phe Pro Asp Ala He Thr He Gly 
545 550 555 560 

Glu Asp Val Ser Gly Met Pro Thr Phe Cys He Pro Val Gin Glu Gly 
565 570 575 

Gly Val Gly Phe Asp Tyr Arg Leu His Met Ala He Ala Asp Lys Arg 
580 585 590 

He Glu Leu Leu Lys Lys Arg Asp Glu Asp Trp Arg Val Gly Asp He 
595 600 605 

Val His Thr Leu Thr Asn Arg Arg Trp Ser Glu Lys Cys Val Ser Tyr 
610 615 620 

Ala Glu Ser His Asp Gin Ala Leu Val Gly Asp Lys Thr He Ala Phe 
625 630 635 640 

Trp Leu Met Asp Lys Asp Met Tyr Asp Phe Met Ala Leu Asp Arg Pro 
645 650 655 

Ser Thr Ser Leu He Asp Arg Gly He Ala Leu His Lys Met He Arg 
660 665 670 

Leu Val Thr Met Gly Leu Gly Gly Glu Gly Tyr Leu Asn Phe Met Gly 
675 680 685 

Asn Glu Phe Gly His Pro Glu Trp He Asp Phe Pro Arg Ala Glu Gin 

690 695 700 

His Leu Ser Asp Gly Ser Val He Pro Gly Asn Gin Phe Ser Tyr Asp 

705 710 715 720 

Lys Cys Arg Arg Arg Phe Asp Leu Gly Asp Ala Glu Tyr Leu Arg Tyr 
725 730 735 

Arg Gly Leu Gin Glu Phe Asp Arg Pro Met Gin Tyr Leu Glu Asp Lys 
740 745 750 

Tyr Glu Phe Met Thr Ser Glu His Gin Phe He Ser Arg Lys Asp Glu 
755 760 765 

Gly Asp Arg Met He Val Phe Glu Lys Gly Asn Leu Val Phe Val Phe 
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770 775 780 

Asn Phe His Trp Thr Lys Ser Tyr Ser Asp Tyr Arg He Ala Cys Leu 
785 790 795 800 

Lys Pro Gly Lys Tyr Lys Val Ala Leu Asp Ser Asp Asp Pro Leu Phe 
805 810 815 

Gly Gly Phe Gly Arg He Asp His Asn Ala Glu Tyr Phe Thr Phe Glu 
820 825 830 

Gly Trp Tyr Asp Asp Arg Pro Arg Ser He Met Val Tyr Ala Pro Cys 
835 840 845 

Lys Thr Ala Val Val Tyr Ala Leu Val Asp Lys Glu Glu Glu Glu Glu 
850 855 860 

Glu Glu Glu Glu Glu Glu Val Ala Ala Val Glu Glu Val Val Val Glu 
865 870 875 880 

Glu Glu 

(2) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2576 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 



TCATTAAAGA 


GGAGAAATTA 


ACTATGAGAG 


GATCTCACCA 


TCACCATCAC 


CATGGGATCT 


TGGCTGAAAA 


GTCTTCTTAC 


AATTCCGAAT 


TCCGACCTTC 


TACAGTTGCA 


GCATCGGGGA 


AAGTCCTTGT 


GCCTGGAACC 


CAGAGTGATA 


GCTCCTCATC 


CTCAACAAAC 


CAATTTGAGT 


TCACTGAGAC 


ATCTCCAGAA 


AATTCCCCAG 


CATCAACTGA 


TGTAGATAGT 


TCAACAATGG 


AACACGCTAG 


CCAGATTAAA 


ACTGAGAACG 


ATGACGTTGA 


GCCGTCAAGT 


GATCTTACAG 


GAAGTGTTGA 


AGAGCTGGAT 


TTTGCTTCAT 


CACTACAACT 


ACAAGAAGGT 


GGTAAACTGG 


AGGAGTCTAA 


AACATTAAAT 


ACTTCTGAAG 


AGACAATTAT 


TGATGAATCT 


GATAGGATCA 


GAGAGAGGGG 


CATCCCTCCA 


CCTGGACTTG 


GTCAGAAGAT 


TTATGAAATA 


GACCCCCTTT 


TGACAAACTA 


TCGTCAACAC 


CTTGATTACA 


GGTATTCACA 


GTACAAGAAA 


CTGAGGGAGG 


CAATTGACAA 


GTATGAGGGT 


GGTTTGGAAG 


CTTTTTCTCG 


TGGTTATGAA 


AAAATGGGTT 


TCACTCGTAG 


TGCTACAGGT 


ATCACTTACC 


GTGAGTGGGC 


TCCTGGTGCC 


CAGTCAGCTG 


CCCTCATTGG 


AGATTTCAAC 


AATTGGGACG 


CAAATGCTGA 


CATTATGACT 


CGGAATGAAT 


TTGGTGTCTG 


GGAGATTTTT 


CTGCCAAATA 


ATGTGGATGG 


TTCTCCTGCA 


ATTCCTCATG 


GGTCCAGAGT 


GAAGATACGT 


ATGGACACTC 


CATCAGGTGT 


TAAGGATTCC 


ATTCCTGCTT 


GGATCAACTA 


CTCTACAGCT 


TCCTGATGAA 


ATTCCATATA 


ATGGAATATA 


TTATGATC C A 
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CCCGAAGAGG AGAGGTATAT CTTCCAACAC CCACGGCCAA AGAAACCAAA GTCGCTGAGA 960 

ATATATGAAT CTCATATTGG AATGAGTAGT CCGGAGCCTA AAATTAACTC ATACGTGAAT 1020 

TTTAGAGATG AAGTTCTTCC TCGCATAAAA AAGCTTGGGT ACAATGCGCT GCAAATTATG 1080 

GCTATTCAAG AGCATTCTTA TTATGCTAGT TTTGGTTATC ATGTCACAAA TTTTTTTGCA 1140 

CCAAGCAGCC GTTTTGGAAC GCCCGACGAC CTTAAGTCTT TGATTGATAA AGCTCATGAG 1200 

CTAGGAATTG TTGTTCTCAT GGACATTGTT CACAGCCATG CATCAAATAA TACTTTAGAT 1260 

GGACTGAACA TGTTTGACGG CACCGATAGT TGTTACTTTC ACTCTGGAGC TCGTGGTTAT 1320 

CATTGGATGT GGGATTCCCG CCTTTTTAAC TATGGAAACT GGGAGGTACT TAGGTATCTT 1380 

CTCTCAAATG CGAGATGGTG GTTGGATGAG TTCAAATTTG ATGGATTTAG ATTTGATGGT 1440 

GTGACATCAA TGATGTATAC TCACCACGGA TTATCGGTGG GATTCACTGG GAACTACGAG 1500 

GAATACTTTG GACTCGCAAC TGATGTGGAT GCTGTTGTGT ATCTGATGCT GGTCAACGAT 1560 

CTTATTCATG GGCTTTTCCC AGATGCAATT ACCATTGGTG AAGATGTTAG CGGAATGCCG 1620 

O ACATTTTGTA TTCCCGTTCA AGATGGGGGT GTTGGCTTTG ACTATCGGCT GCATATGGCA 1680 

0 ATTGCTGATA AATGGATTGA GTTGCTCAAG AAACGGGATG AGGATTGGAG AGTGGGTGAT 1740 

01 ATTGTTCATA CACTGACAAA TAGAAGATGG TCGGAAAAGT GTGTTTCATA CGCTGAAAGT 1800 
m CATGATCAAG CTCTAGTCGG TGATAAAACT ATAGCATTCT GGCTGATGGA CAAGGATATG 1860 

TATGATTTTA TGGCTCTGGA TAGACCGCCA ACATCATTAA TAGATCGTGG GATAGCATTG 1920 

O CACAAGATGA TTAGGCTTGT AACTATGGGA TTAGGAGGAG AAGGGTACCT AAATTTCATG 1980 

fy GGAAATGAAT TCGGCCACCC TGAGTGGATT GATTTCCCTA GGGCTGAACA ACACCTCTCT 2040 

g GATGACTCAG TAATTCCCGG AAACCAATTC AGTTATGATA AATGCAGACG GAGATTTGAC 2100 

III CTGGGAGATG CAGAATATTT AAGATACCGT GGGTTGCAAG AATTTGACCG GGCTATGCAG 2160 

TATC TTGAAG ATAAATATGA GTTTATGACT TCAGAACACC AGTTCATATC ACGAAAGGAT 2220 

GAAGGAGATA GGATGATTGT ATTTGAAAAA GGAAACCTAG TTTTTGTCTT TAATTTTCAC 2280 

TGGACAAAAA GCTATTCAGA C TATC GC ATA GGCTGCCTGA AGCCTGGAAA ATACAAGGTT 2340 

GCCTTGGACT CAGATGATCC ACTTTTTGGT GGCTTCGGGA GAATTGATCA TAATGCCGAA 2400 

TATTTCACCT TTGAAGGATG GTATGATGAT CGTCCTCGTT CAATTATGGT GTATGCACCT 2460 

TGTAGAACAG CAGTGGTCTA TGCACTAGTA GACAAAGAAG AAGAAGAAGA AGAAGAAGAA 2520 

GAAGAAGTAG CAGTAGTAGA AGAAGTAGTA GTAGAAGAAG AATGAACGAA CTTGTG 2576 

(2) INFORMATION FOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2529 base pairs 
<B) TYPE: nucleic acid 
(C) STRANDEDNESS : single 
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(D) TOPOLOGY: linear 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 

GGATGCTAAT GTTTCTGTAT TCTTGAAAAA GCACTCTCTT TCACGGAAGA TCTTGGCTGA 60 

AAAGTCTTCT TACAATTCCG AATCCCGACC TTCTACAGTT GCAGCATCGG GGAAAGTCCT 120 

TGTGCCTGGA AYCCAGAGTG ATAGCTCCTC ATCCTCAACA GACCAATTTG AGTTCACTGA 180 

GACATCTCCA GAAAATTCCC CAGCATCAAC TGATGTAGAT AGTTCAACAA TGGAACACGC 240 

TAGCCAGATT AAAAC TGAGA ACGATGACGT TGAGCCGTCA AGTGATCTTA CAGGAAGTGT 300 

TGAAGAGCTG GATTTTGCTT CATCACTACA ACTACAAGAA GGTGGTAAAC TGGAGGAGTC 3 60 

TAAAACATTA AATACTTCTG AAGAGACAAT TATTGATGAA TCTGATAGGA TCAGAGAGAG 420 

GGGCATCCCT CCACCTGGAC TTGGTCAGAA GATTTATGAA ATAGACCCCC TTTTGACAAA 480 

CTATCGTCAA CACCTTGATT ACAGGTATTC ACAGTACAAG AAACTGAGGG AGGCAATTGA 540 

CAAGTATGAG GGTGGTTTGG AAGCTTTTTC TCGTGGTTAT GAAAAAATGG GTTTCACTCG 600 

t* 

m T AGTGC T AC A GGTATCACTT ACCGTGAGTG GGCTCCTGGT GCCCAGTCAG CTGCCCTCAT 660 

p TGGAGATTTC AACAATTGGG ACGCAAATGC TGACATTATG ACTCGGAATG AATTTGGTGT 720 

01 CTGGGAGATT TTTCTGCCAA ATAATGTGGA TGGTTCTCCT GCAATTCCTC ATGGGTCCAG 780 

AGTGAAGATA CGYATGGACA CTCCATCAGG TGTTAAGGAT TCCATTCCTG CTTGGATCAA 840 

jg CTACTCTTTA CAGCTTCCTG ATGAAATTCC ATATAATGGA ATATATTATG ATCCACCCGA 900 

AGAGGAGAGG TATRTCTTCC AACACCCACG GCCAAAGAAA CCAAAGTCGC TGAGAATATA 960 

f** TGAATCTCAT ATTGGAATGA GTAGTCCGGA GCCTAAAATT AACTCATACG TGAATTTTAG 1020 

fll 

£ AGATGAAGTT CTTCCTCGCA TAAAAAASCT TGGGTACAAT GCGGTGCAAA TTATGGCTAT 1080 

W TCAAGAGCAT TCTTATTATG CTAGTTTTGG TTATCATGTC ACAAATTTTT TTGCACCAAG 1140 

CAGCCGTTTT GGAACGCCCG ACGACCTTAA GTCTTTGATT GATAAAGCTC ATGAGCTAGG 1200 

AATTGTTGTT CTCATGGACA TTGTTCACAG CCATGCATCA AATAATACTT TAGATGGACT 1260 

GAACATGTTT GACGGCACAG ATAGTTGTTA CTTTCACTCT GGAGCTCGTG GTTATCATTG 1320 

GATGTGGGAT TCCCGCCTCT TTAACTATGG AAACTGGGAG GTACTTAGGT ATCTTCTCTC 1380 

AAATGCGAGA TGGTGGTTGG ATGAGTTCAA ATTTGATGGA TTTAGATTTG ATGGTGTGAC 1440 

ATCAATGATG TATACTCACC ACGGATTATC GGTGGGATTC ACTGGGAACT ACGAGGAATA 1500 

CTTTGGACTC GCAACTGATG TGGATGCTGT TGTGTATCTG ATGCTGGTCA ACGATCTTAT 1560 

TCACGGGCTT TTCCCAGATG CAATTACCAT TGGTGAAGAT GTTAGCGGAA TGCCGACATT 1620 

TTGTATTCCC GTTCAAGATG GGGGTGTTGG CTTTGACTAT CGGCTGCATA TGGCAATTGC 1680 

TGATAAATGG ATTGAGTTGC TCAAGAAACG GGATGAGGAT TGGAGAGTGG GTGATATTGT 1740 

TCATACACTG ACAAATAGAA GATGGTCGGA AAAGTGTGTT TCATMCGCTG AAAGTCATGA 1800 
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TCAAGCTCTA 


GTCGGTGATA AAACTATAGC 


ATYCTGGCTG ATGGACAAGG ATATGTATGA 


lODU 


TTTTATGGCT 


CTGGATAGAC 


CGYCAACAYC 


ATTAATAGAT 


CGTGGGATAG 


CATTGCACAA 




GATGATTAGG 


CTTGTAACTA 


TGGGATTAGG 


AGGAGAAGGG 


TACCTAAATT 


TCATGGGAAA 




TGAATTCGGC 


CACCCTGAGT 


GGATTGATTT 


CCCTAGGGCT 


GARCAACACC 


TCTCTGATGG 




CTCAGTAATT 


CCCGGAAACC 


AATTCAGTTA 


TGATAAATGC 


AGACGGAGAT 


TTGACCTGGG 




AGATGCAGAA 


TATTTAAGAT 


ACCATGGGTT 


GCAAGAATTT 


GACCGGGCTA 


TGCAGTATCT 


91 £0 


TGAAGATAAA 


TATGAGTTTA 


TGACTTCAGA 


ACACCAGTTC 


ATATCACGAA AGGATGAAGG 


9 99 0 
A Z Z U 


AGATAGGATG 


ATTGTATTTG 


AAARAGGAAA 


CCTAGTTTTT 


GTCTTTAATT 


TTCACTGGAC 


ZZOU 


AAATAGCTAT 


TCAGACTATC 


GCATAGGCTG 


CCTGAAGCCT 


GGAAAATACA 


AGGTTGGCTT 




GGACTCAGAT 


GATCCACTTT 


TTGGTGGCTT 


CGGGAGAATT 


GATCATAATG 


CCGAATATTT 


2400 


CACCTCTGAA 


GGATCGTATG 


ATGATCGTCC 


TCGTTCAATT 


ATGGTGTATG 


CACCTAGTAG 


2460 


AACAGCAGTG 


GTCTATGCAC 


TAGTAGACAA 


ANTAGAAGNA 


GAAGAAGAAG 


AAGAANCCGN 


2520 


NGAAGAATT 












2529 



m (2) INFORMATION FOR SEQ ID NO: 18: 

01 

jfc (i) SEQUENCE CHARACTERISTICS: 

J (A) LENGTH: 3231 base pairs 

^ (B) TYPE: nucleic acid 

^ (C) STRANDEDNESS : single 

*' (D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18: 

as f; 



GATTT AATAC 


GACTCACTAT 


AGGGATTTTT 


TTTTTTTTTT 


TTTTAAAAAC 


CTCCTCCACT 


60 


CAGTCTTGGG 


ATCTCTCTCT 


CTCTTCACGC 


TTCTCTTGGG 


GCCTTGAACT 


CAGCAATTTG 


120 


ACACTCAGTT 


AGTTACACTC 


CTATCACTCA 


TCAGATCTCT 


ATTTTTTCTC 


TTAATTCCAA 


180 


CCAAGGAATG 


AATTAAAAGA 


TTAGATTTGA 


AGGAGAGAAG 


AAGAAAGATG 


GTGTATACAC 


240 


TCTCTGGAGT 


TCGTTTTCCT 


ACTGTTCCAT 


CAGTGTACAA 


ATCTAATGGA 


TTCAGCAGTA 


300 


ATGGTGATCG 


GAGGAATGCT 


AATGTTTCTG 


TATTCTTGAA 


AAAGCACTCT 


CTTTCACGGA 


360 


AGATCTTGGC 


TGAAAAGTCT 


TCTTACGATT 


CCGAATCCCG 


ACCTTCTACA 


GTTGCAGCAT 


420 


CGGGGAAAGT 


CCTTGTACCT 


GGAATCCAGA 


GTGATAGCTC 


CTCATCCTCA ACAGACCAAT 


480 


TTGAGTTCAC 


TGAGACAGCT 


CCAGAAAATT 


CCCCAGCATC 


AACTGATGTG 


GATAGTTCAA 


540 


CAATGGAACA 


CGCTAGCCAG 


ATTAAAACTG 


AGAACGATGA 


CGTTGAGCCG 


TCAAGTGATC 


600 


TTACAGGAAG 


TGTTGAAGAG 


TTGGATTTTG 


CTTCATCACT 


ACAACTACAA 


GAAGGTGGTA 


660 


AACTGGAGGA 


GTC TAAAAC A 


TTAAATACTT 


CTGAAGAGAC 


AATTATTGAT 


GAATCTGATA 


720 


GGATCAGAGA 


GAGGGGCATC 


CCTCCACCTG 


GACTTGGTCA 


GAAGATTTAT 


GAAATAGACC 


780 



CCCTTTTGAC AAACTATCGT CAACACCTTG 

GGGAGGCAAT TGACAAGTAT GAGGGTGGTT 

TGGGTTTCAC TCGTAGTGCT ACAGGTATCA 

CAGCTGCTCT CATTGGAGAT TTCAACAATT 

ATGAATTTGG TGTCTGGGAG ATTTTTCTGC 

CTCATGGGTC CAGAGTGAAG ATACGCATGG 

CTGCTTGGAT CAACTACTCT TTACAGCTTC 

ATGATCCACC CGAAGAGGAG AGGTATGTCT 
CGCTGAGAAT ATATGAATCT CATATTGGAA 

ACGTGAATTT TAGAGATGAA GTTCTTCCTC 

AAATTATGGC TATTCAAGAG CATTCTTATT 

TTTTTGCACC AAGCAGCCGT TTTGGAACGC 

CTCATGAGCT AGGAATTGTT GTTCTCATGG 

P CTTTAGATGG ACTGAACATG TTTGACGGCA 

III GTGGTTATCA TTGGATGTGG GATTCCCGCC 

jj GGTATCTTCT CTCAAATGCG AGATGGTGGT 

m TTGATGGTGT GACATCAATG ATGTATACTC 

e ACTACGAGGA ATACTTTGGA CTCGCAACTG 

0 

^ CCAACGATCT TATTCATGGG CTTTTCCCAG 

% GAATGCCGAC ATTTTGTATT CCCGTTCAAG 

p ATATGGCAAT TGCTGATAAA TGGATTGAGT 

TGGGTGATAT TGTTCATACA CTGACAAATA 
CTGAAAGTCA TGATCAAGCT CTAGTCGGTG 
AGGATATGTA TGATTTTATG GCTTTGGATA 
TAGCATTGCA CAAGATGATT AGGCTTGTAA 
ATTTCATGGG AAATGAATTC GGCCACCCTG 
ACCTCTCTGA TGGCTCAGTA ATTCCCGGAA 
GATTTGACCT GGGAGATGCA GAATATTTAA 
CTATGCAGTA TCTTGAAGAT AAATATGAGT 
GAAAGGATGA AGGAGATAGG ATGATTGTAT 
ATTTTCACTG GACAAAAAGC TATTCAGACT 
ACAAGGTTGC CTTGGACTCA GATGATCCAC 
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a tt a n a nnT a 

ri.1 1 riL-riAaLJ 1 A 


TTP AP A PT 1 a P 
1 lLALALrlAL. 


a a pa a a atpa 

AALrAAAA 1 LjA 




TPPA Af^rTTT 


1 1 L ILol VjrVjl 


m a rpr* a a a a a a 
lAiVjAAAAAA 


on n 
y UU 


PTTAPPOTQA 


CJTPnflPTPPT 
o 1 oVjvjL ILL 1 


PPTPPPP APT 




nnpApnPAAA 


TCPTfl A P ATT 
1 1 uALn 1 1 


ATPAPTPPPa 
A 1 bAL 1 LLjLtA 




CAAATAATGT 


GGATGGTTCT 


CCTGCAATTC 


1080 


ACACTTCATC 


AGGTGTTAAG 


GATTCCATTC 


1140 


CTGATGAAAT 


TCCATATAAT 


GGAATATATT 


1200 


TCCAACACCC 

1 vjj-ttvj 1 s\\j 1 


ACGGCCAAAG AAACCAAAGT 
GGAGCCTAAA ATTAACTCAT 


1260 
lizU 


HPATA A A A A A 


CCTTGGGTAC 


AATGCGGTGC 


loou 


A TPPT A PTTT 


TGGTTATCAT 


GTCACAAATT 


1 A A f\ 

144U 


PPHAPPAPPT 
LLVjAL.oALL- 1 


TAAGTCTTTG 


ATTGATAAAG 


1 c a a 




CAGCCATGCA 


TCAAATAATA 


1560 


LALtAIALjI ILt 


TTACTTTCAC 


TCTGGAGCTC 


1620 




TGGAAACTGG 


GAGGTACTTA 


1680 


1 orvj-ri 1 LjALj 1 Lr 


CAAATTTGRT 


GGATTTAGAT 


1740 


APPAPfSCATT 


ATCGGTGGGA 


TTCACTGGGA 


loUU 


ATnTPn A TCP 
.ri. 1 Vj 1 r\.Vj.rl 1 


TGCCGTGTAT 


CTGATGCTGG 


lo bu 


A TPP A A T"T l a P 
AIVjLAAI 1AL 


CATTGGTGAA 


GATGTTAGCG 


1 a a a 

1920 


A 1 LrVaLrLrLj 1 Lj 1 


TGGCTTTGAC 


TATCGGCTGC 


1980 


1 LjL, 1 LAALxAA 


ACGGGATGAG 


GATTGGAGAG 


2040 


pa apaTrT'TT 1 

VjAALtA 1 1 L- 


GGAAAAGTGT 


GTTTCATACG 


A -1 A A 

2100 


Al AAAAL 1A1 


AGCATTCTGG 


CTGATGGACA 


2160 


p a r , r , r ,r nr t aap 
LjAL.LLr 1 LAAC 


ATCATTAATA 


GATCGTGGGA 


2220 


p t a t p r 1 r* a nry 
LlAlLnjoAl 1 


AGGAGGAGAA 


GGGTACCTAA 


2280 


ALi 1 LjUA 1 1 LtA 


TTTCCCTAGG 


GCTGAACAAC 


2340 


a pp a a rnrnr* a r* 
ALLAA I 1 LALt 


TTATGATAAA 


TGCAGACGGA 


2400 


Vj A 1 L Vj 1 VjLt 


GTTGCAAGAA 


TTTGACCGGG 


o /i r a 

2460 


TTATGACTTC 


AGAACACCAG 


TTCATATCAC 


2520 


TTGAAAAAGG 


AAACCTAGTT 


TTTGTCTTTA 


2580 


ATCGCATAGG 


CTGGCTGAAG 


CCTGGAAAAT 


2640 


TTTTTGGTGG 


CTTCGGGAGA ATTGATCATA 


2700 
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ATGCCGAATG TTTCACCTTT GAAGGATGGT ATGATGATCG TCCTCGTTCA ATTATGGTGT 
ATGCACCTAG TAGAACAGCA GTGGTCTATG CACTAGTAGA CAAAGAAGAA GAAGAAGAAG 
AAGTAGCAGT AGTAGAAGAA GTAGTAGTAG AAGAAGAATG AACGAACTTG TGATCGCGTT 
GAAAGATTTG AACGCTACAT AGAGCTTCTT GACGTATCTG GCAATATTGC ATCAGTCTTG 
GCGGAATTTC ATGTGACAAA AGGTTTGCAA TTCTTTCCAC TATTAGTAGT GCAACGATAT 
ACGCAGAGAT GAAGTGCTGA ACAAACATAT GTAAAATCGA TGAATTTATG TCGAATGCTG 
GGACGGGCTT CAGCAGGTTT TGCTTAGTGA GTTCTGTAAA TTGTCATCTC TTTANATGTA 
CAGCCCACTA GAAATCAATT ATGTGAGACC TAAAAAACAA TAACCATAAA ATGGAAATAG 
TGCTGATCTA ATGATGTTTT AANCCNNNNA AAAAAAAAAA AAAAACTCGA G 

(2) INFORMATION FOR SEQ ID NO: 19: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2578 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19: 



TCATTAAAGA 


GGAGAAATTA 


ACTATGAGAG 


GATCTCACCA 


TCACCATCAC 


CATGGGATCT 


TGGCTGAAAA 


GTCTTCTTAC 


AATTCCGAAT 


TCCGACCTTC 


TACAGTTGCA 


GCATCGGGGA 


AAGTCCTTGT 


GCCTGGAACC 


CAGAGTGATA 


GCTCCTCATC 


CTCAACAAAC 


CAATTTGAGT 


TCACTGAGAC 


ATCTCCAGAA 


AATTCCCCAG 


CATCAACTGA 


TGTAGATAGT 


TCAACAATGG 


AACACGCTAG 


CCAGATTAAA 


ACTGAGAACG 


ATGACGTTGA 


GCCGTCAAGT 


GATCTTACAG 


GAAGTGTTGA 


AGAGCTGGAT 


TTTGCTTCAT 


CACTACAACT 


ACAAGAAGGT 


GGTAAACTGG 


AGGAGTCTAA 


AACATTAAAT 


ACTTCTGAAG 


AGACAATTAT 


TGATGAATCT 


GATAGGATCA 


GAGAGAGGGG 


CATCCCTCCA 


CCTGGACTTG 


GTCAGAAGAT 


TTATGAAATA 


GACCCCCTTT 


TGACAAACTA 


TCGTCAACAC 


CTTGATTACA 


GGTATTCACA 


GTACAAGAAA 


CTGAGGGAGG 


CAATTGACAA 


GTATGAGGGT 


GGTTTGGAAG 


CTTTTTCTCG 


TGGTTATGAA 


AAAATGGGTT 


TCACTCGTAG 


TGCTACAGGT 


ATCACTTACC 


GTGAGTGGGC 


TCCTGGTGCC 


CAGTCAGCTG 


CCCTCATTGG 


AGATTTCAAC 


AATTGGGACG 


CAAATGCTGA 


CATTATGACT 


CGGAATGAAT 


TTGGTGTCTG 


GGAGATTTTT 


CTGCCAAATA 


ATGTGGATGG 


TTCTCCTGCA ATTCCTCATG 


GGTCCAGAGT 


GAAGATACGT 


ATGGACACTC 


CATCAGGTGT 


TAAGGATTCC 


ATTCCTGCTT 


GGATCAACTA 


CTCTTCACAG 


CTTCCTGATG 


AAATTCCATA 


TAATGGAATA 


TATTATGATC 


CACCCGAAGA 


GGAGAGGTAT 


ATCTTCCAAC 


ACCCACGGCC 


AAAGAAACCA AAGTCGCTGA 


GAATATATGA 


ATCTCATATT 


GGAATGAGTA 


GTCCGGAGCC 


TAAAATTAAC 


TCATACGTGA 
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ATTTTAGAGA TGAAGTTCTT CCTCGCATAA AAAAGCTTGG GTACAATGCG GTGCAAATTA 1080 

TGGCTATTCA AGAGCATTCT TATTATGCTA GTTTTGGTTA TCATGTCACA AATTTTTTTG 1140 

CACCAAGCAG CCGTTTTGGA ACGCCCGACG ACCTTAAGTC TTTGATTGAT AAAGCTCATG 1200 

AGCTAGGAAT TGTTGTTCTC ATGGACATTG TTCACAGCCA TGCATCAAAT AATACTTTAG 1260 

ATGGACTGAA CATGTTTGAC GGCACCGATA GTTGTTACTT TCACTCTGGA GCTCGTGGTT 1320 

ATCATTGGAT GTGGGATTCC CGCCTTTTTA ACTATGGAAA CTGGGAGGTA CTTAGGTATC 1380 

TTCTCTCAAA TGCGAGATGG TGGTTGGATG AGTTCAAATT TGATGGATTT AGATTTGATG 1440 

GTGTGACATC AATGATGTAT ACTCACCACG GATTATCGGT GGGATTCACT GGGAACTACG 1500 

AGGAATACTT TGGACTCGCA ACTGATGTGG ATGCTGTTGT GTATCTGATG CTGGTCAACG 1560 

ATCTTATTCA TGGGCTTTTC CCAGATGCAA TTACCATTGG TGAAGATGTT AGCGGAATGC 1620 

CGACATTTTG TATTCCCGTT CAAGATGGGG GTGTTGGCTT TGACTATCGG CTGCATATGG 1680 

CAATTGCTGA TAAATGGATT GAGTTGCTCA AGAAACGGGA TGAGGATTGG AGAGTGGGTG 1740 

ATATTGTTCA TACACTGACA AATAGAAGAT GGTCGGAAAA GTGTGTTTCA TACGCTGAAA 1800 

p GTCATGATCA AGCTCTAGTC GGTGATAAAA CTATAGCATT CTGGCTGATG GACAAGGATA 1860 

jjj TGTATGATTT TATGGCTCTG GATAGACCGC CAACATCATT AATAGATCGT GGGATAGCAT 1920 

S TGCACAAGAT GATTAGGCTT GTAACTATGG GATTAGGAGG AGAAGGGTAC C TAAATTTC A 1980 

J TGGGAAATGA ATTCGGCCAC CCTGAGTGGA TTGATTTCCC TAGGGCTGAA CAACACCTCT 2040 

CTGATGACTC AGTAATTCCC GGAAACCAAT TCAGTTATGA TAAATGCAGA CGGAGATTTG 2100 

.f4 ACCTGGGAGA TGCAGAATAT TTAAGATACC GTGGGTTGCA AGAATTTGAC CGGGCTATGC 2160 

| AGTATCTTGA AGATAAATAT GAGTTTATGA C TTC AGAAC A CCAGTTCATA TCACGAAAGG 2220 

M ATGAAGGAGA TAGGATGATT GTATTTGAAA AAGGAAACCT AGTTTTTGTC TTTAATTTTC 2280 

ACTGGACAAA AAGCTATTCA GACTATCGCA TAGGCTGCCT GAAGCCTGGA AAATACAAGG 2340 

TTGCCTTGGA CTCAGATGAT CCACTTTTTG GTGGCTTCGG GAGAATTGAT CATAATGCCG 2400 

AATATTTCAC CTTTGAAGGA TGGTATGATG ATCGTCCTCG TTCAATTATG GTGTATGCAC 2460 

CTTGTAGAAC AGCAGTGGTC TATGCACTAG TAGACAAAGA AGAAGAAGAA GAAGAAGAAG 2520 

AAGAAGAAGT AGCAGTAGTA GAAGAAGTAG TAGTAGAAGA AGAATGAACG AACTTGTG 2578 
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p S 



(2) INFORMATION FOR SEQ ID NO: 20: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
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(D) TOPOLOGY: linear 



<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 20: 
AATTTYATGG GNAAYGARTT YGG 23 
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